Purpose of the programme
The division according to the effect you can achieve with our tools and applications:
- basic text processing (tokenization, morphosyntactic tagging, syntactic parsing, recognition of named entities)
- creating, reviewing and annotating bodies
- lexicographic search (extraction of terms and multi word units, unification of lexical meanings, search for word examples for further research)
- speech processing
The rapidly growing range of services, tools and functions within the CLARIN-PL infrastructure can be overwhelming. It is therefore often difficult to realise what type of assistance we offer. In order to overcome the above difficulties, as well as those resulting from the unstable terminology associated with the use of machine language processing in Polish scientific discourse, we present a list of infrastructure elements ordered by the functionality criterion.
A helpful but thereby simplified criterion for functional division is the ‘research phase‘. We have chosen to assign the individual functions of our tools/services/applications to one of four research phases that can typically be detailed when working with NLP methods in scientific settings (H&SS sector).
(Where and possibly how to obtain the research material in the textual form?)
This is the stage that precedes the actual research activities – its aim is to obtain text in a form that is suitable for further stages of machine language analysis. In practice, this stage includes activities such as OCR, transcription of spoken texts, downloading texts from the Internet, collecting posts from social networking sites, etc. CLARIN only partially supports activities in this research phase.
(How to prepare the textual material for further research?)
In the processing stage the collected texts are processed either by machine or by hand. As a result of processing, the text material is provided with an additional layer of information relating to the linguistic-communicative aspects of the text. Machine processing can mean, for example, morphosyntactic tagging, basic form assignment (lemmatisation), word stemming (tokenisation), normalisation, etc. Manual processing means manual annotation/marking/coding carried out in order to assign information to text fragments that cannot be automatically detected.
(What information can be obtained from the textual material?)
In the analysis phase, the information assigned to the text in the processing phase undergoes extraction, grouping and other more advanced processes, the effect of which is to organise it according to a pre-defined directory. Examples of functions specific to this phase can be stylometric analysis or terminology extraction. The analysis may also be performed according to queries formulated individually by the researcher while browsing the corpora using standard search engines (e.g. KonText, Korpusomat).
(How to interpret information obtained from the textual material)?
This is the stage of research that currently takes place completely outside the CLARIN infrastructure. It is the stage of substantive interpretation of the data produced in the previous stages. CLARIN staff will be happy to provide the necessary technical assistance, but data interpretation is usually a task entirely dependent on the discipline represented by the researcher.