MultiEmo – a tool for the multilingual sentiment analysis


MultiEmo is a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The corpus can be analysed as text, paragraphs or sentences. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under a Creative Commons Attribution 4.0 International Licence.

Bibliographic address of the main publication (in case of using Chronocorpus, please cite this publication):

Pęzik, P., & Buczek, M. (2015). Druga wersja klasyfikatora tematycznego tekstów WiKNN. Zadanie A23. Punkt kontrolny M16.

Kocoń J., Miłkowski P., Kanclerz K. (2021) MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews. In: Paszynski M., Kranzlmüller D., Krzhizhanovskaya V.V., Dongarra J.J., Sloot P.M.A. (eds) Computational Science – ICCS 2021. ICCS 2021. Lecture Notes in Computer Science, vol 12743. Springer, Cham.