dc.contributor.author | Kocoń, Jan |
dc.contributor.author | Miłkowski, Piotr |
dc.contributor.author | Kanclerz, Kamil |
dc.date.accessioned | 2021-04-05T11:04:34Z |
dc.date.available | 2021-04-05T11:04:34Z |
dc.date.issued | 2021-03-01 |
dc.identifier.uri | http://hdl.handle.net/11321/798 |
dc.description | MultiEmo, a new benchmark data set for the multilingual sentiment analysis task including 11 languages. The collection contains consumer reviews from four domains: medicine, hotels, products and university. The original reviews in Polish contained 8,216 documents consisting of 57,466 sentences. The reviews were manually annotated with sentiment at the level of the whole document and at the level of a sentence (3 annotators per element). We achieved a high Positive Specific Agreement value of 0.91 for texts and 0.88 for sentences. The collection was then translated automatically into English, Chinese, Italian, Japanese, Russian, German, Spanish, French, Dutch and Portuguese. MultiEmo is publicly available under a Creative Commons Attribution 4.0 International Licence. More information: https://github.com/CLARIN-PL/multiemo Citation: @inproceedings{kocon2021multiemo, title={Multiemo: Multilingual, multilevel, multidomain sentiment analysis corpus of consumer reviews}, author={Koco{\'n}, Jan and Mi{\l}kowski, Piotr and Kanclerz, Kamil}, booktitle={International Conference on Computational Science}, pages={297--312}, year={2021}, organization={Springer} } |
dc.language.iso | pol |
dc.language.iso | eng |
dc.language.iso | zho |
dc.language.iso | ita |
dc.language.iso | jpn |
dc.language.iso | rus |
dc.language.iso | deu |
dc.language.iso | spa |
dc.language.iso | fra |
dc.language.iso | nld |
dc.language.iso | por |
dc.publisher | Wrocław University of Science and Technology |
dc.rights | The MIT License |
dc.rights.uri | https://opensource.org/licenses/MIT |
dc.rights.label | PUB |
dc.source.uri | https://github.com/CLARIN-PL/multiemo |
dc.subject | MultiEmo |
dc.subject | sentiment analysis |
dc.subject | multilingual |
dc.subject | benchmark dataset |
dc.subject | dataset |
dc.subject | corpus |
dc.subject | multidomain |
dc.subject | multilevel |
dc.title | MultiEmo: Multilingual, Multilevel, Multidomain Sentiment Analysis Corpus of Consumer Reviews |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
hidden | false |
hasMetadata | false |
has.files | yes |
branding | CLARIN-PL |
demo.uri | http://ws.clarin-pl.eu/multiemo |
contact.person | Jan Kocoń jan.kocon@pwr.edu.pl Wrocław University of Science and Technology |
sponsor | Ministry of Science and Higher Education (Poland) N/A CLARIN-PL nationalFunds |
size.info | 82160 texts |
size.info | 782 mb |
size.info | 506 files |
files.size | 422783119 |
files.count | 2 |