CLARIN-PL

See, what are we working on

CLARIN-PL is a Polish research consortium, a section of pan-European research infrastructure – CLARIN. It consists of six scientific units, where we create and develop electronic language resources and tools for working with large collections of texts in Polish.

CLARIN infrastructure is a network of centers whose tasks are:

  • to construct basic technology and services necessary for network operation – type A;
  • to provide users with tools and resources related to natural language processing – type B;
  • to share resources description (metadata) – type C;
  • to support users and provide access to knowledge and experts – type K.

Type B centers are the basic elements of the network. Polish CLARIN node – Language Technology Centre CL-PL (LTC) – is being built at Wrocław University of Technology. Thanks to the strict observance of accepted standards, users registered in the LTC will be granted free access to tools and language resources available both in Poland and in CLARIN centers in other member states.

The aim of the Language Technology Centre CL-PL is filling gaps in the system of Polish basic tools and resources. We actively cooperate with scientists in the fields of humanities and social sciences in order to create and develop innovative, e-humanities-oriented applications for the Polish language.  Ultimately, cooperation may also include digital libraries, archives, museums, etc.

Further LTC tasks include:

  • constructing the repository where collected tools and resources will be labelled with permanent identifiers;
  • taking care of technical coherence of the system and its compliance with the standards, intellectual property rights, licenses, and ethical rules;
  • establishing a security policy, for example, through certification of servers and responsible management of personal data.

The scheme of the proposed architecture for the Polish type B center

The application layer will be directly visible to the end users. Web services layer will provide communication capabilities between the centers and will allow access to the services offered by the center through SOAP and WSDL protocols. Resources layer will be responsible for maintaining many independent tools components and their metadata description. In the content layer language resources (corpora, recordings, texts sent by users, etc.) will be stored and archived.