Description
A tool used to build multilingual corpora from wikipedia. Download the web pages, convert them to plain text, identify language, etc.
Tool type
Tool for creating own tools and resources
Tool task
corpus creation, corpus building
Key words
web data, wikipedia, corpus, text processing, multi-lingual
Research domain
Computational Linguistics, Linguistics
Language
Multiple languages
Country
Czech
CLARIN centre
Charles University in Prague
Contact person
Martin Majliš
URL
https://lindat.mff.cuni.cz/repository/xmlui/handle/11858/00-097C-0000-0022-60D6-1?show=full
Similar to