Description
The step to e-research in philosophy depends on the availability of high quality, easily accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. Corpora matching these needs are at the moment virtually non-existing. In this project we want to address this corpus building problem by developing and making available an open source, web-based, user-friendly workflow from textual digital images to TEI, based on an OCRopus/Tesseract webservice and a multilingual version of OCR-postcorrection webservice TICCLops. We shall demonstrate the tool on a multilingual, multi-script corpus of important 18th-20th-century European philosophical texts. These texts are of fundamental importance to understand the development of key scientific concepts such as explanation and truth in 18th-20th-century Europe. The tool will be of general interest and importance to solve problems of CLARIN-compliant corpora building.
Tool type
Processing flow
Tool task
corpus building
Key words
service, multi-lingual
Research domain
Philosophy
Language
Country
Netherlands
CLARIN centre
Huygens ING
Contact person
Dr. Arianna Betti
URL
http://www.clarin.nl/node/1404
Similar to