Inforex – web-based text corpus management system


Inforex is an open-source network system used to manage, search and analyze the content of selected corpora deposited in the DSpace repository. It allows the users not only to calculate basic statistics and word-frequency lists, but above all, the system assists in the creation of qualitative language data that can be used as a basis for an in-depth and systematic analysis of research material. By integrating with DSpace, it is possible to quickly export data for corpus research.

The system allows for the introduction of various levels of description (annotation), both automatic and created manually by users, added with the use of in-built tools, which recognize specific language categories (including proper names, temporal expressions). As the data is stored on a secure server, different people involved in the research work can cooperate with each other remotely, having insight into each other’s work progress. Inforex allows the users to conveniently view and compare the annotations made by different people working on a given corpus. It facilitates and improves the control over work progress through a clear system of flagging (marking) the developed content.

Inforex facilitates the qualitative processing of data that can be used as a material for corpus research in the field of linguistics, as well as, sociology, historical studies, media studies, discourse analysis and communication processes. The analysis of selected linguistic features allows for the identification of regularities in the material, which may become the basis for the creation of more universal principles.

Bibliographic address of the main publication (in case of using Inforex, please cite this publication)

Auxiliary materials:

Link to the manual

Examples of applications