The new Polish platform Corpus Czterech Wieszczów is a project, whose aim is to create a modern resource containing the full works of four national poets, including Adam Mickiewicz, Juliusz Słowacki, Zygmunt Krasiński and Cyprian Norwid. It covers their literary pieces from 1817 to 1883.
The project is the result of the cooperation between linguists, literary historians and computer scientists. The corpus employs the achievements of contemporary philology and editorial science. The main purpose of creating the corpus is to provide tools for research concerning the works of the Polish poets.
The basic task is to define a set of texts. We draw attention to the existence of various versions of the text, including forms used in the original edition and subsequent editions, as well as commonalised forms e.g. kobita – kobieta (‘woman’ – the archaic version and the standard version, respectively). The assigned tasks also include morphosyntactic tagging, taking into account such problems as spelling e.g. xiażę – prince (‘prince’ – the archaic and standard version, respectively), 19th century inflection and vocabulary or unusual syntax. The next stage will be to conduct text analysis and visualisation by generating the lists of word frequency, creating concordances and collocations, and carrying out stylometric analyses or information extraction. The tasks will be performed using services and tools created by CLARIN-PL, such as Inforex, KonText, NER or MeWeX.
The project described above was presented at the CLARIN-PL conference on 24-25 June 2021, during a webinar entitled Corpus Czterech Wieszczów – a new dimension of Polish Romanticism heritage. The recording of this webinar is available at: https://www.youtube.com/watch?v=VwGx3Unyw6w.
We also encourage you to check out other materials available on the CLARIN-PL channel: https://www.youtube.com/channel/UCqrhEITxu8_MIWPnFdYomPw/videos
Your CLARIN-PL Team