The OpenSoNaR project will provide end users with the online means for extracting information from the SoNaR-500 reference corpus of contemporary written Dutch. This includes exploring the texts and navigating through the SoNaR-500 corpus by way of the metadata. The project makes the contents of the new SoNaR-500 reference corpus available to laymen and specialist researchers alike. Based on the desiderata of four distinct CLARIN-NL priority groups, access to the corpus for navigation, exploration and exploitation in an online environment will be through a front-end, to be called WhiteLab, providing a range of interfaces that provide user-driven functionality. The back-end is the new retrieval engine BlackLab developed by INL (Dutch Institute for Lexicology), designed to provide access to corpora for linguistic and lexicographical use in the CLARIN infrastructure.
browse, corpus exploration, information extraction
corpus, data, web-application
Institute for Dutch Lexicology
Prof. dr. Max Louwerse