dc.contributor.author | Oleksy, Marcin |
dc.contributor.author | Dominiak, Daria |
dc.contributor.author | Wróż, Anita |
dc.contributor.author | Kobylińska, Wioleta |
dc.contributor.author | Kałkus, Dagmara |
dc.contributor.author | Zielińska, Kamila |
dc.contributor.author | Fikus, Dominika |
dc.contributor.author | Walentynowicz, Wiktor |
dc.date.accessioned | 2019-04-03T11:54:15Z |
dc.date.available | 2019-04-03T11:54:15Z |
dc.date.issued | 2019-04-03 |
dc.identifier.uri | http://hdl.handle.net/11321/637 |
dc.description | The Corpus of the Colloquial Polish Language (CCPL) is a UGC-based corpus tagged with morpho-syntactic features by the team of professional linguists from the Wrocław University of Technology. It consists of 400 000 tagged segments and has been used for training of the UGC-tagger, also available in the CLARIN repository. Main resources: Corpus files (NCP tagset): CCPL - anonimizacja_xml_out_ver(3.05).zip Manual annotation guidelines: Specification for morphosyntactic tagging of UGC texts.pdf Corpus files (UD tagset): corpus_petrov_tags.zip |
dc.language.iso | pol |
dc.publisher | SentiOne |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | CC |
dc.source.uri | https://sentione.com/knowledge/eu-research-project |
dc.subject | corpus |
dc.subject | user-generated content |
dc.subject | colloquial style |
dc.subject | morpho-syntactic tagging |
dc.title | Corpus of the Colloquial Polish Language |
dc.type | corpus |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN-PL |
contact.person | Michał Brzezicki michal@sentione.com SentiOne |
sponsor | ERDF POIR.01.01.01-00-0806/16 Senti Cognitive Services euFunds |
files.size | 31533400 |
files.count | 5 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)
- Name
- CCPL - anonimizacja_xml_out_ver(3.05).zip
- Size
- 7.05 MB
- Format
- application/zip
- Description
- Corpus of the Colloquial Polish Language
- Name
- Specification for morphosyntactic tagging of UGC texts.pdf
- Size
- 157.46 KB
- Format
- Description
- Annotation guidelines
- Name
- anonimizacja_xml_out_ver(3.04).zip
- Size
- 7.49 MB
- Format
- application/zip
- Description
- Colloquial language corpus for Polish