RESOURCE AND TOOL MANUALS
Practical and simple instructions aimed at every user.
They contain information about the basic functions, operation and possible use of the tools.
A series of lectures and workshops: CLARIN-PL in research practice.
Seminar “CLARIN-PL-Biz – Language Technologies for Learning and Business”, march 16 2021 r.
We invite you to read the presentations and materials used during the seminar “CLARIN-PL-Biz – Language Technologies for Learning and Business”, which took place on march 16 2021:
- Introduction: (PRESENTATION)
- CLARIN BIZ for storing and processing big language data: (PRESENTATION)
- Language resources – knowledge sources and training patterns for machine learning: (PRESENTATION)
- Basic tools for language analysis: (PRESENTATION)
- Wstępne przetwarzanie języka: (PRESENTATION)
- Speech processing service: (PRESENTATION)
- Semantic analysis of texts: (PRESENTATION)
- Semantic analysis of texts: (PRESENTATION)
- Dialogue systems: basic, scaffolding and modular systems for the Polish language: (PRESENTATION)
- Sentiment analysis (emotional overtones in text) (PRESENTATION)
- Sentiment analysis (emotional overtones in text) (RECORDING)
- Selected planned CLARIN-PL-Biz applications (PRESENTATION)
- Clarin-Biz Information Folder
Institute of Journalism, Media and Social Communication, Jagiellonian University, Cracow, 2-3 March 2020
Shared materials from the Workshop at the Institute of Journalism, Media and Social Communication of the Jagiellonian University
A workshop for a team of researchers from IDMiKS UJ was held on March 2-3, 2020.
- CLARIN-PL – great research infrastructure of language technologies for humanities and social sciences (PRESENTATION)
- CLARIN-PL. Natural language engineering in social sciences – basic concepts (PRESENTATION)
- Parliamentary Discourse Corpus (PRESENTATION)
- Korpusomat – a tool for creating searchable morphosyntactically tagged Polish corpora (PRESENTATION)
- Tools for extracting information from the text (PRESENTATION)
- Extracting information from the text (PRESENTATION)
- LEM: extracting statistics from the text (PRESENTATION)
- plWordNet – Polish lexical resources and their usage (PRESENTATION) (CASE STUDY: Article by K. Rybiński – example of using plWordNet)
- Speech processing tools (PRESENTATION)
The workshop about CLARIN-PL in research practice (UMCS LUBLIN)
WORKSHOP MATERIALS AND PRESENTATIONS:
- CLARIN-PL – Introduction (PRESENTATION)
- plWordNet (PRESENTATION)
- Speech processing services (PRESENTATION)
- Dependency parsing (PRESENTATION)
- Terminology extraction from the text – TermoPL (PRESENTATION)
- Parliamentary Discourse Corpus (PRESENTATION)
- A tool for creating searchable morphosyntactically tagged Polish corpora – Korpusomat (PRESENTATION)
- Morfeusz2 tagger (PRESENTATION)
- Multilingual corpora + Kontext corpus search engine (PRESENTATION)
- DSpace Repository + CLARIN Cloud platform + INFOREX (a tool for annotating corpora) (PRESENTATION)
Materials and presentations from the workshop: “CLARIN-PL tools for research in psychology”.
Shared materials from the Workshop “CLARIN-PL tools in research in psychology” held in Poznań at SWPS University on 23-24 May 2019.
Event website: http://badanianarracyjne.pl/karuzela/warsztaty/
Some of the presentations exceed the size of 20 MB.
Materials and presentations:
- CLARIN in a nutshell. Brief information about the CLARIN ERIC research infrastructure and the Polish CLARIN-PL consortium. What is it? What are the objectives? Who sets it up? Who are the users? (PRESENTATION)
- Information extraction and stylometry. The presentation introduces the possibilities of analysing material from the perspective of psychologists.
For example: stylometric analysis and information extraction from traumatised subjects: search for characteristic linguistic features (formal and semantic), attempt at typology, demonstration of changes over time. Information extraction and stylometry in the research scope of psychology (I) (PRESENTATION), Semantic analysis – stylometry (II) (PRESENTATION), Thematic classification (III) (PRESENTATION), Changes over time / topic modeling (IV) (PRESENTATION)
- Speech analysis tools. How to quickly do basic research activities on acoustic material – formal analysis of recordings, creating transcriptions of recordings. Speech (PRESENTATION)
- plWordNet – Polish lexical resources and the possibility of using them. This lexical databse is modelled on the Princeton WordNet originally created for American psychologists. The plWordNet can be used for various research purposes – including psychological ones (examples of using emotional annotation of terms). plWordNet (PRESENTATION)
- Dspace and Inforex – creating text corpora. The presentation introduces tools designed to create and provide access to corpora of linguistic materials (Dspace repository) and the environment integrated with it. DSpace (PRESENTATION), CLARIN CLOUD and the environment of integrated tools (PRESENTATION)
CLARIN-PL workshops in Torun – materials available
Shared materials from the Workshops at Faculty of Philology, Nicolaus Copernicus University
The organisers of the event were:
– Faculty of Philology, Nicolaus Copernicus University
– Department of English Philology of Nicolaus Copernicus University
– CLARIN-PL Language Technology Centre
– Centre for Language Evolution Studies (CLES) UMK (http://cles.umk.pl/)
The programme can be viewed here.
Materials and presentations:
- Creating and managing corpora, conducting automatic and manual annotation: DSpace, Inforex, Corpusomat: DSpace and Nextcloud presentation, Inforex presentation, Corpusomat (PRESENTATION).
- Morphosyntactic analysis: Morpheus 2 (PRESENTATION).
- Tools for analysing speech corpora: Speech Processing (PRESENTATION).
- Valence dictionary Walenty: Walenty (PRESENTATION).
- TermoPL – A tool for extracting terminology from texts: TermoPL (PRESENTATION).
- Wydobywanie informacji z tekstu i stylometria: Wydobywanie informacji z tekstu i stylometria 1 (PRESENTATION), prezentacja Wydobywanie informacji z tekstu i stylometria 2 (PRESENTATION), prezentacja Wydobywanie informacji z tekstu i stylometria 3 (PRESENTATION).
- Parallel Polish-English: Paralela (PRESENTATION).
- SpokesPL – search engine for Polish conversational data: SpokesPL (PRESENTATION).
- plWordNet – Słowosieć – a large relational lexico-semantic dictionary of the Polish language. plWordNet (PRESENTATION).
CLARIN-PL Workshop for Association 61, I Have a Right to Know – presentations and materials
Shared materials from the for Association 61, I Have a Right to Know held in Warsaw on 13-14 July 2018
- Introduction – a guide to CLARIN-PL resources and tools: (PRESENTATION)
- Data storage and processing in DSpace and NextCloud: (PRESENTATION)
- Inforex – corpus management and annotation: (PRESENTATION)
- CLARIN-PL tools: thematic and semantic text analysis: (PRESENTATION)
- Extraction of Information and text features: frequency analysis (LEM): (PRESENTATION)
- WebSty – an open network system for stylistic analysis of the text: (PRESENTATION)
- Tools for automatic collocation extraction (MeWeX): (PRESENTATION)
- Sample materials for corpus-based exercises (zip archives): people and parties
PUBLICATIONS AND PRESENTATIONS
Materials devoted to Polish research in the digital humanities and the resources and tools developed in CLARIN-PL.
Guidelines for linguists
- Technical document – semantic description of a noun in plWordNet
- Technical document – semantic description of an adjective in plWordNet
- Technical document – semantic description of an adverb in plWordNet
- Technical document – qualifier system in plWordNet
- Technical document – procedure for checking the lexicality of a multi-word combination in plWordNet
E-HUMANITIES IN POLAND
A database of links to selected Polish linguistic tools and resources; information about research centres, projects and initiatives related to computational linguistics applications in Poland.
- “Told History” – a project of “the Grodzka Gate – NN Theatre” in Lublin, involving the recording, compilation and dissemination of oral accounts of Lublin and the Lublin region from the inter-war period to contemporary times.
- The Computer Stylistics Group – a website devoted to the possibilities of using language engineering in modern stylistic research.
- An analysis of the language used by characters in Sienkiewicz’s Trilogy.
- Centre for Digital Humanities of the Institute of Literary Research of the Polish Academy of Sciences – the Centre deals with the presence of the humanities on the web, the use of new technologies in literary research and literary research on new technologies.
- Blog as a new form of multimedia writing – Cooperation with CLARIN-PL
- Interactive literary map – Cooperation with CLARIN-PL
- Discourse Analysis – a scientific consortium and portal collecting texts important for the Polish culture and society, and bringing together researchers in the humanities and social sciences interested in Polish public discourse.
- ehum.psnc.pl – a website presenting tools and resources useful in humanities research, in particular supporting collectng, processing and data analysis and publishing research results.
- Platforma Obsługi Nauki PLATON – a project devoted to the development of the national scientific ICT infrastructure with applications and services supporting research work.
- Institutional Culture on the Web: Content and Audience – the aim of the project is an in-depth characterisation of the growing segment of institutional culture audiences, who come into contact with it via the Internet.
Cooperation with CLARIN-PL
- Sensuality in Polish culture – a project devoted to the issue of sensuality understood as a historically changeable set of representations of human senses. The project is carried out on-line in the form of a website serving as a multimedia Internet thematic encyclopaedia.
- A cultural and suprasegmental analysis of communicative interactions marked by politeness and impoliteness – a project aiming at the analysis of the expressive and communicative functions of the existing German corpus and the creation and analysis of parallel corpora in other languages (Polish, Italian, Bulgarian).
- Women’s Archive: Writers – a project aimed at creating a database and initiating a digital archive of unpublished manuscripts of women living in historical Poland from the 16th century to the present day.
- Baltic Digital Library
- Wrocław University Digital Library
- CYBRA Łódź Regional Digital Library
- Polona National Digital Library
- dlibra.psnc.pl – site of products supporting digital libraries (dArceo, dLab, dLibra, dMuseion).
- Lower Silesia Digital Library
- Digital Libraries Federation – a set of network services based on digital resources available in Polish digital libraries and repositories launched in the PIONIER network.
- Kuyavian-Pomeranian Digital Library
- Malopolska Digital Library
- Mazovia Digital Library
- Podkarpacka Digital Library
- Podlaska Digital Library
- Pomeranian Digital Library
- Digital Repository of Scientific Institutes – a repository collecting archival materials, scientific publications, research documentation and written cultural heritage.
- Silesian Digital Library
- West Pomeranian Digital Library “Pomerania”.
- Digital Libraries Team of the Poznań Supercomputing and Networking Centre – a team conducting research and development work in the field of digital libraries.
- Digital Library of Zielona Góra
- Virtual Library of Science
- Digital Centre Project: Poland works for social change and increased civic involvement using the potential of digital tools and models of cooperation based on sharing resources and knowledge.
- Modern Poland Foundation – works for modern education, promotes open, free access to educational materials, books and textbooks.
- The Fifth Medium Foundation – a Lublin-based non-governmental organisation dealing with non-formal education, focused on popularising the so-called “new media” in the educational process.
- History and Media – a portal dedicated to historical Internet resources and new trends in digital humanities; it informs about online historical/archival collections, digital tools for historians, provides articles on e.g. methodological issues or historical reconstructions.
- Labkit.pl – interdisciplinary meetings, workshops and projects combining new technologies and education.
- THATCamp Polska (The Humanities and Technology Camp) – open meetings and workshops promoting digital humanities and integrating the Polish digital culture research community.