CLARIN-PL Repository - About and Policies
- Mission Statement
- Terms of Service
- About Repository
- License Agreement and Contracts
- Intellectual Property Rights
- Privacy Policy
- Metadata Policy
- Preservation Policy
- Citing Data Policy
CLARIN Mission Statement
The ultimate objective of CLARIN ERIC is to advance research in humanities and social sciences by giving researchers unified single sign-on access to a platform which integrates language-based resources and advanced tools at a European level. This shall be implemented by the construction and operation of a shared distributed infrastructure that aims at making language resources, technology and expertise available to the humanities and social sciences (henceforth abbreviated HSS) research communities at large.
To know more about CLARIN ERIC visit CLARIN-ShortGuide.pdf
CLARIN-PL Mission Statement
The Repository “CLARIN-PL Centre” gives access to language resources and tools from the Wroclaw University of Technology, Department of Computational Intelligence and other CLARIN-PL Members.
The CLARIN-PL Centre stores the resources and tools according to the mission:
- To promote the knowledge and the use of the Polish language, by conducting scientific applied research.
- To stimulate and coordinate the scientific description of the Polish vocabulary and grammar in all its aspects through the ages.
- To produce, link and give access to source material for Polish in the form of historical and current corpora, dictionaries, lexical digital databases, grammars and the accompanying tools.
Terms of Service
To achieve our mission statement,we set out some ground rules through the Terms of Service. By accessing or using any kind of data or services provided by the Repository, you agree to abide by the Terms contained in the above mentioned document.
Data in CLARIN-PL repository are made available under the licence attached to the resources. In case there is no licence, data is made freely available for access, printing and download for the purposes of non-commercial research or private study. Users must acknowledge in any publication, the Deposited Work using a persistent identifier (see Citing Data), its original author(s)/creator(s), and any publisher where applicable. Full items must not be harvested by robots except transiently for full-text indexing or citation analysis. Full items must not be sold commercially unless explicitaly granted by the attached licence without formal permission of the copyright holders.
About Repository
It is like a library for linguistic data and tools.
- Search for data and tools and easily download them.
- Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)
License Agreement and Contracts
At the moment, CLARIN-PL distinguishes three types of contracts.
- For every deposit, we enter into a standard contract with the submitter, the so-called "Deposition License Agreement", in which we describe our rights and duties and the submitter acknowledges that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf.
- Everyone who downloads data is bound by the licence assigned to the item - in order to download protected data, one has to be authenticated and needs to electronically sign the licence. A list of available licenses in our repository can be found here.
- For submitters, there is a possibility for setting custom licences to items during the submission workflow.
Intellectual Property Rights
As mentioned in the section License Agreement and Contracts, we require the depositor of data or tools to sign a Distribution License Agreement, which specifies that they have the right to submit the data and gives us (the repository centre) right to distribute the data on their behalf. This means that depositors are solely responsible for taking care of IPR issues before publishing data or tools by submitting them to us.
Should anyone have a suspicion that any of the datasets or tools in our repository violate Intellectual Property Rights, they should contact us immediately at our help desk.
Privacy Policy
Read our Privacy Policy in order to learn how we manage personal data collected by the CLARIN-PL repository and services.
Metadata Policy
Deposited content must be accompanied by sufficient metadata describing its content, provenance and formats in order to support its preservation and dissemination. Metadata are freely accessible and are distributed in the public domain (under CC0). However, we reserve the right to be informed about commercial usage of metadata from CLARIN-PL repository including a description of your use case at Help Desk.
Preservation Policy
CLARIN-PL is committed to the long-term care of items deposited in our repository and strives to adopt the current best practice in digital preservation. The CLARIN-PL centre will ensure preservation and access to the data in this repository as long as sufficient funding is available.
The financing of CLARIN-PL Language Technology Centre was secured for the years 2024-2026 by the Polish Ministry of Science and Higher Education. The funding is based on a 3-year project model. A positive evaluation is an important element in the project extension process. In the unlikely event of the suspension of funding by the Ministry, Wrocław University of Science and Technology, as an official owner of the whole infrastructure, is obliged to guarantee the management and maintenance of the CLARIN-PL centre. This is confirmed by the WUST internal document of the acceptance of the fixed asset no. ST-011341 dated December 29, 2023. CLARIN-PL has also received funding for the years 2025-2027 within the European funds (FENG program). The regulations of the project guarantee the additional continuity period for another five years after the end of the project that is 2028-2033. This is confirmed by the Wrocław University of Science and Technology internal documents.
The legal aspects of the process of relocating data to another institution is addressed by the deposition license, Deposition License Agreement. This agreement between the depositor and the repository gives all permissions required to meet responsibilities for preservation of the data. The transfer of custody of the data provided and permission to share the data on the depositor’s behalf is covered by the deposition license too. The repository has the rights to copy, transform, and store the items, as well as to provide access to them as long as it complies with the deposition license.
Technology: Our repository uses a repository system widely used at CLARIN data centres (the CLARIN DSpace repository system with well-documented modifications). It is currently running on virtual machines at the AI department of Wroclaw University of Science and Technology, which has the capacity to keep them up and running, also while potential negotiations to take over the hosting of either the repository or all the data in the repository are completed. In case where the data are moved to another DSpace repository within CLARIN or another repository using a PID system, the PID’s will be kept unchanged and the transfer will not affect the validity of PID references already included in publications etc., and will therefore have minimal impact on user experience. Updates and upgrades of the repository system software and server software will be carried out continuously. In general, developments in technology are followed and changes relevant to the repository system are continuously discussed in the Standing Committee for CLARIN Technical Centres, SCCTC.
Data storage and backup: WCSS is responsible for storage media handling and monitoring of server infrastructure and data partitions.
File formats: During the submission workflow of data, the use of open and well-documented data formats are encouraged allowing for easy conversion into other formats. Data in proprietary formats will only be accepted when conversion cannot easily be done. Users are encouraged to deposit data in the recommended standards and formats developed by the CLARIN community as described on the following page.
Integrity: To ensure the integrity of the data sets, for every deposited file a checksum (md5 type) is made which allows us to check for defects of the data over the years. Once deposited, files in data sets are never changed and only minor changes to the metadata are allowed. For example: correction of spelling, minor changes in documentation, additional documentation added. Changes to the data themselves will be issued as a new version of the dataset, which will obtain a new persistent identifier. These changes are only made in close collaboration with the producer of the dataset.
Information security: The repository system logs any changes to data resources, and any changes in metadata. Furthermore, the changes can only be performed by a small group of people that belong to the CLARIN-PL group. Automatic checks for data formats, links, and activity are run on a weekly basis and inspected by the repository responsible. Furthermore, the servers are under surveillance including monitoring the configured network services and disk usage, and intrusion prevention software is in use on the repository server to prevent security incidents from occurring. The servers are scanned and evaluated for security issues by the Information Security Team. The repository administrators respond to any reported security-related issues or incidents as quickly as possible. These initiatives focus on keeping the data and metadata safe and secure.
Suggestions from users about changes in functionality are appreciated and encouraged, e.g. users can send mails to helpdesk@clarin-pl.eu and the staff in the helpdesk and knowledge sharing service are ready to enter into dialogue with users about their wishes for new or improved functionality, better help pages, metadata guidance or other changes in user requirements. The preservation plan is subject to change yearly based on future experience from depositors and with technology development.
Authenticity Policy
Data producers hand over the materials to us. We do not change the data, except by adding metadata if required. If applicable, we create collection-level objects which provide a context for the embedded data sets. The repository maintains links to other relevant materials (e.g. articles, theses, documentation, related data) and to software and tools that have been used in production of the data, if applicable. The identity of a depositor is ensured by the required login using CLARIN SpF for identification.
Citing Data Policy
Data Users must acknowledge and cite data sources properly in all publications and outputs.