dc.contributor.author | Dziob, Agnieszka |
dc.contributor.author | Grabowski, Łukasz |
dc.contributor.author | Kanclerz, Kamil |
dc.contributor.author | Kompa, Karolina |
dc.contributor.author | Maziarz, Marek |
dc.contributor.author | Piasecki, Maciej |
dc.contributor.author | Piotrowski, Tadeusz |
dc.contributor.author | Rudnicka, Ewa |
dc.date.accessioned | 2021-12-21T13:44:42Z |
dc.date.available | 2021-12-21T13:44:42Z |
dc.date.issued | 2021-12-21 |
dc.identifier.uri | http://hdl.handle.net/11321/853 |
dc.description | We analysed over 350 Polish and English word combinations (multi-word expressions, MWEs). Half of the sample was drawn from traditional dictionaries, while the other half was created by hand to represent free word combinations (i.e., MWEs not found in dictionaries, the information is given in the column "Status"). Syntactically these were noun phrases (NPs), either adjectives and nouns (A+N), or nouns and nouns (N+N), called 'bigrams'. We operationalised semantic compositionality by testing two custom-designed criteria, i.e., Intuition and Paraphrase, as well as by using statistical methods (selected measures of collocational strength, i.e. log-likelihood, PMI and Jaccard) for checking word order fixedness and word combination specificity. We also checked how long (in letters) the syntactic nucleus / its complement is (the measure highly correlated with word frequency, which is known as Zipf’s law (columns "AWL" and "HWL"). In the last column ("LCA") we give classification results obtained from Latent Class Analysis. |
dc.language.iso | pol |
dc.language.iso | eng |
dc.publisher | Wrocław University of Science and Technology |
dc.rights | Creative Commons - Attribution 4.0 International (CC BY 4.0) |
dc.rights.uri | https://creativecommons.org/licenses/by/4.0/ |
dc.rights.label | CC |
dc.subject | multi-word units |
dc.subject | multi-word expressions |
dc.subject | MWE detection |
dc.subject | lexical semantics |
dc.subject | lexicography |
dc.subject | semantic compositionality |
dc.title | Lexicalisation of Polish and English word combinations: two samples manually annotated (with collocation strength corpus statistics) |
dc.type | lexicalConceptualResource |
metashare.ResourceInfo#ContentInfo.detailedType | wordList |
metashare.ResourceInfo#ContentInfo.mediaType | text |
has.files | yes |
branding | CLARIN-PL |
contact.person | Marek Maziarz marek.maziarz@pwr.edu.pl Wrocław University of Science and Technology |
size.info | 350 multiWordUnits |
files.size | 8316 |
files.count | 1 |
Files in this item
This item is
Creative Commons - Attribution 4.0 International (CC BY 4.0)
Distributed under Creative Commons
and licensed under:Creative Commons - Attribution 4.0 International (CC BY 4.0)