Word Embeddings for Polish ************************** https://nextcloud.clarin-pl.eu/index.php/s/S7DwyZc3TGRvmvy 1. Word2Vec (WUST): source: https://nextcloud.clarin-pl.eu/index.php/s/Fbibk1csSP0RzVH training corpora: KGR10 - arch: skipgram / CBOW - types: lemmas / lemmas + MWE + NE - alg: hierachical SoftMax, Negative Sampling - dim: 300 publications: --- 2. FastText (WUST): source: https://nextcloud.clarin-pl.eu/index.php/s/eLFGpO2t92I2jBj training corpora: KGR10 - arch: skipgram - types: lemmas / lemmas + MWE + NE - alg: hierachical SoftMax - dim: 300 publications: --- Pre-trained models for Polish, external sources: 1. Word2Vec (IPI PAN) source: http://dsmodels.nlp.ipipan.waw.pl/ data: NKJP / Wikipedia - arch: skipgram / CBOW - types: forms / lemmas - alg: hierachical SoftMax, Negative Sampling - dim: 100, 300 publications: Mykowiecka, A., Marciniak, M., Rychlik, P. (2017) Testing word embeddings for Polish. Cognitive Sudies / Études cognitives, 2017(17) 2. Word2Vec (Lodz) source: http://publications.it.p.lodz.pl/2016/word_embeddings/ training data: Wikipedia - arch: skipgram / CBOW - types: lemmas - alg: - - dim: 100 publications: Rogalski, M., Szczepaniak, P. (2016) Word Embeddings for the Polish Language, ICAISC 2016: Artificial Intelligence and Soft Computing pp 126-135 3. FastText (Facebook) source: https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md training corpora: Wikipedia - arch: skipgram - types: lemmas / lemmas + MWE + NE - alg: skipgram - dim: 300 Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2016) Enriching Word Vectors with Subword Information