Frequently Asked Questions

What is the repository?
What submissions do we accept?
Do I need to create an account to download and/or make a submission?
Why should I submit my data into your repository?
Why should I submit my tools?
What is the PID (handle) good for?
What is the actual depositing/archiving procedure?
What if I want/need to update the archived data?
What if I want to withdraw the resources in the future? Can I delete the data?
How to cite a submission?
How safe is my data if I store it with you?
What license should I pick for my data/tool?
Where can I find more information about supported licenses?
Why we strongly prefer real authors to institutions?
Is there any common search tool across different CLARIN repositories?
How do I create a new version of a resource?

I don't want to/cannot make the data publicly available or want/can make them available only after a specific date. Could I still archive them with CLARIN-PL?

What is the repository?

It is like a library for linguistic data and tools.

Search for data and tools and easily download them.
Deposit the data and be sure it is safely stored, everyone can find it, use it, and correctly cite it (giving you credit)

What submissions do we accept?

We accept any linguistic and/or NLP data and tools: corpora, treebanks, lexica, but also trained language models, parsers, taggers, MT systems, linguistic web services, etc. We do not strictly require you to upload the data itself, although it is always better to do it. Still, you can make a metadata-only record, if required. We also support online license-signing for immediate availability of restricted resources.

When depositing items to the CLARIN-PL repository, please first carefully read the guidelines on How to Deposit that explain the process of submitting an item and the requirements on its metadata, while the structure and accepted formats of the data itself are given in the CLARIN-PL guidelines for data submissions.

Do I need to create an account to download and/or make a submission?

You can download data and tools with a license that allows free sharing without any obstacles. Just read the license and download. This applies to all data with Creative Commons and tools with open source licenses.
To download data and tools that require you to sign a license, you need to log in. To make a submission, you also need to log in. However, if you are from academic world, you probably don't need any new account.
Just click "Login" and search for your academic institution. To sign in, you can use any account with a Identity Provider that is a member of EduGAIN federation.
If you don't have an academic account that works with us, let us know. We will make you a local account.

Why should I submit my data into your repository?

It is free and safe.
We respect your license. We encourage Free Data and believe it benefits not only users, but also the data providers. However we accept also more closed data and we can make users sign a license before downloading your data, if that is what you need.
The data is visible, giving you maximal credit for your work (google, VLO, DataCite, OLAC, Data Citation Index, arXive).
The data is easy to cite. We provide ready-to-use one-click citations in BibTex, RIS, and other popular reference formats. All the citations include permanent links created from persistent identifiers (we use handles for PIDs). These PIDs are future-proof.
For some data, like text corpora or treebanks, we can provide additional services, like full-text or even tree-query search.

Why should I submit my tools?

See "Why should I submit my data into your repository?". Everything applies to software tools too.
You can just link your version control system (svn, git), if it is publicly accessible. You can also link your project page, or demo site.

What is the PID (handle) good for?

It is a special permanent URL. It provides a permanent link that will resolve correctly even if in some distant future the data is moved. Thus it should be used as URL in citations.

What is the actual depositing/archiving procedure?

During the submission of digital language resources to the repository, the data undergo a curation process in order to ensure quality and consistency. We assist you in meeting necessary requirements for sustainable resource archiving. Data have to be provided with metadata in standard formats accepted/adopted in the respective communities, persistent identifiers (PIDs) have to be assigned, IPR issues have to be resolved and clear statements with regard to licensing and possible use of the resources are to be made. The depositor is also required to electronically sign a deposition agreement acknowledging the (s)he is the holder of rights to the data and that (s)he has the right to grant the rights contained in this licence. Once the data is indeed deposited in the repository it is assigned a PID for stable reference.

What if I want/need to update the archived data?

Every change to the resources and metadata should be stored as a new version with a new PID. However if the changes are minimal (e.g., typos or clear mistakes) then contact our Help Desk with the submission PID and the changes which should be made. It is up to the reviewer to decide whether these changes should result in a new version or not.

What if I want to withdraw the resources in the future? Can I delete the data?

Yes, in this case contact our Help Desk with the submission PID and the reason. However, we need to keep a reference that the data was in our repository (because a persistent identifier was issued), so the administrative metadata will be retained indicating that the data itself were removed.

I don't want / cannot make the data publicly available or make them available after a specific date. Would you still archive them for me?

In accordance with the advocacy of the research infrastructures and the general development with respect to Open Access, we strongly encourage the data producers to be as open as possible. However, in other circumstances we will archive your data even if they will not be publicly available. Please, contact our Help Desk prior to completing the submission.

How to cite a submissions?

See our policies.

How safe is my data, if I store it with you?

Quite safe, probably much more than in your computer. Our storage plan:

All the data in the repository have a on-site backup copy.
There is another off-site copy, so even complete destruction of our building does not destroy your data.
We check all the copies regularly and should any of them become corrupted we delete it and make a new one.
We keep at least three copies, one of them off-site, at all times

What license should I pick for my data/tool?

We encourage using a free license. A representative selection of free licenses as well as CC licenses (more appropriate for data) is available directly during submission.
If for some reason you need a different license, Contact Us.

Where can I find more information about supported licenses?

The list of licenses currently supported is here. However, do not hesitate to Contact Us in case you need your specific license.

Why we strongly prefer real authors to institutions?

It is not about contact, it is about citations, credit and trust. That is why we have separate metadata fields for authors and for contact person. Contact to a helpdesk is perfect, not acknowledging the authors of a scholarly work is not. We support the direct citation of data (https://www.force11.org/datacitation). That is why we also give them PIDs, create formatted citations, etc. That is the reason we really want proper authors, so that they get citations and other scientists know whose work they rely on.

Is there any common search tool across different CLARIN repositories?

Yes, in particular the CLARIN Virtual language observatory or CLARIN VLO for short. This browser helps you find linguistic resources, services and tools provided by CLARIN as well as some other repositories. However, please keep in mind that CLARIN VLO is an aggregator that provides information about a specific resource, but that the source repository, like CLARIN-PL typically gives more information, so it is worth checking also the source repository landing page, as given by CLARIN VLO. Original resources, services and tools are hosted by CLARIN centres and other data providers. This means that you cannot use the services and tools, or search and analyse the resources directly through CLARIN VLO.

There are also other aggregators, such as OpenAIRE, which is a pan-European information platform and a network of Open Access repositories, or re3data, which is a global registry of research data repositories and ELG (Platform for all European Language Technologies). The main difference between CLARIN VLO and OpenAIRE or re3data is that CARIN VLO focuses on language resources, services and tools, whereas the other cover all academic disciplines.

How do I create a new version of a resource?

We do not allow changes in the data (except minor ones, e.g. to fix obvious errors) after a submission was published. For a new version of the resource you should create a new submission, however, you do not (in fact, should not) create a new submission from scratch, but follow the procedure explained in Submitting a new version of an Item.

I don't want to/cannot make the data publicly available or want/can make them available only after a specific date. Could I still archive them with CLARIN-PL?

In accordance with the advocacy of the research infrastructures and the general development with respect to Open Access, we strongly encourage the data producers to be as open as possible. However, in certain circumstances we will archive your data even if they will not be publicly available. Please, contact our Help Desk prior to completing the submission.