MeWeX – a tool for extracting multi-word collocations from corpora and for creating dictionaries of lexical units


Multi-Word Expressions (MeWeX) is an open-source service provided by CLARIN, used to extract multi-word collocations from corpora and to create dictionaries of lexical units in Polish. Extraction includes such structural types as, for example, noun + adjective (karta debetowa ‘debit card’) or noun + noun (zbyt nieruchomości ‘sale of real estate’). The MeWeX application can be used, for example, in lexicography or corpus linguistics – when expanding lexical dictionaries of multi-word units. It can also be used as a tool for the analysis of phraseologisms and terms occurring in specific collections of texts.

The MeWeX application uses both filters in the form of lexical-morpho-syntactic constraints as well as a number of algorithms for calculating the dispersion function and the extraction measure.

Bibliographic address of the main publication (in case of using MeWeX, please cite this publication):

Link to the manual

Examples of applications