Show simple item record

 
dc.contributor.author Dziob, Agnieszka
dc.contributor.author Grabowski, Łukasz
dc.contributor.author Kanclerz, Kamil
dc.contributor.author Kompa, Karolina
dc.contributor.author Maziarz, Marek
dc.contributor.author Piasecki, Maciej
dc.contributor.author Piotrowski, Tadeusz
dc.contributor.author Rudnicka, Ewa
dc.date.accessioned 2021-12-21T13:44:42Z
dc.date.available 2021-12-21T13:44:42Z
dc.date.issued 2021-12-21
dc.identifier.uri http://hdl.handle.net/11321/853
dc.description We analysed over 350 Polish and English word combinations (multi-word expressions, MWEs). Half of the sample was drawn from traditional dictionaries, while the other half was created by hand to represent free word combinations (i.e., MWEs not found in dictionaries, the information is given in the column "Status"). Syntactically these were noun phrases (NPs), either adjectives and nouns (A+N), or nouns and nouns (N+N), called 'bigrams'. We operationalised semantic compositionality by testing two custom-designed criteria, i.e., Intuition and Paraphrase, as well as by using statistical methods (selected measures of collocational strength, i.e. log-likelihood, PMI and Jaccard) for checking word order fixedness and word combination specificity. We also checked how long (in letters) the syntactic nucleus / its complement is (the measure highly correlated with word frequency, which is known as Zipf’s law (columns "AWL" and "HWL"). In the last column ("LCA") we give classification results obtained from Latent Class Analysis.
dc.language.iso pol
dc.language.iso eng
dc.publisher Wrocław University of Science and Technology
dc.rights Creative Commons - Attribution 4.0 International (CC BY 4.0)
dc.rights.uri https://creativecommons.org/licenses/by/4.0/
dc.rights.label CC
dc.subject multi-word units
dc.subject multi-word expressions
dc.subject MWE detection
dc.subject lexical semantics
dc.subject lexicography
dc.subject semantic compositionality
dc.title Lexicalisation of Polish and English word combinations: two samples manually annotated (with collocation strength corpus statistics)
dc.type lexicalConceptualResource
metashare.ResourceInfo#ContentInfo.detailedType wordList
metashare.ResourceInfo#ContentInfo.mediaType text
has.files yes
branding CLARIN-PL
contact.person Marek Maziarz marek.maziarz@pwr.edu.pl Wrocław University of Science and Technology
size.info 350 multiWordUnits
files.size 8316
files.count 1


 Files in this item

This item is
Distributed under Creative Commons
and licensed under:
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Attribution Required
Icon
Name
mwe.7z
Size
8.12 KB
Format
Unknown
Description
Unknown
 Download file

Show simple item record