Cat
Name
Cat – a simple text classification tool
Description
The tool allows for the classification of texts according to one of the following criteria: (1) thematic classification according to the machine-learning model based on the five categories of Wikipedia, (2) thematic classification according to the machine-learning model based on press topics, (3) classification according to the similarity of the grammatical style to the style of one of the well-known Polish writers of the 19th and 20th centuries, (4) in the case of multilingual corpora, detecting the contribution of a particular language within the whole corpus. The analysis can be performed on any corpus packed as a .zip archive. It is also possible to perform advanced classification according to other models or large amounts of texts – for this purpose, please contact: webserwisy@clarin-pl.eu
Bibliographic address of the main publication (in case of using Chronocorpus, please cite this publication):
Walkowiak T., Datko S., Maciejewski H.: Distance metrics in Open-Set Classification of Text Documents by Local Outlier Factor and Doc2Vec. In: Wotawa F., Friedrich G., Pill I., Koitz-Hristov R., Ali M. (eds) Advances and Trends in Artificial Intelligence. From Theory to Practice. IEA/AIE 2019. Lecture Notes in Computer Science, vol 11606. Springer, Cham
Auxiliary materials:
Link to the manual
Examples of applications