CLARIN ERIC (Common Language Resources and Technology Infrastructure, European Research Infrastructure Consortium) is a pan-European research infrastructure, part of the European Roadmap for Scientific Infrastructure (ESFRI – European Strategy Forum on Research Infrastructures).
CLARIN aims to offer language resources and software tools for natural language processing to researchers in the humanities and social sciences. The resources include digital archives, corpora, electronic dictionaries, and language models. The tools help perform such tasks as syntactic and semantic analyses, speech recognition, search for proper names or recognition of situation descriptions.
CLARIN meets the challenges and perspectives of e-Science research, particularly in the area referred to as e-Humanities. As a new methodological trend, e-Science includes a wide application of technology in support of people who work with large texts collections. The number of documents available on the Internet and in other electronic forms (such as press releases, digital archives, advertising texts or blogs) increases very fast. That is a valuable and challenging source of research data. Commercial search tools give little support to scientists who seek specialised information related to a field of study, a person, an event, a time period, and so on. It is exceedingly time-consuming – and clearly beyond the capacity of individual researchers – to find interesting information and create specifications on hundreds or even thousands of objects with only simple tools at one’s disposal. Scientific results may suffer if people are forced to reduce the number of sources drastically.
Researchers who need to examine large quantities of text can benefit significantly from the experience in computational linguistics, natural language engineering, and language and speech technology. There are many interesting solutions and proposals which work on a scale far surpassing the traditional methods. CLARIN’s task is to facilitate access to advanced language analysis tools and language resources, and thus help implement language technology for use in the humanities and social sciences.