Word Embeddings for Polish
**************************

https://nextcloud.clarin-pl.eu/index.php/s/S7DwyZc3TGRvmvy

1. Word2Vec (WUST):
source: https://nextcloud.clarin-pl.eu/index.php/s/Fbibk1csSP0RzVH
training corpora: KGR10

- arch: skipgram / CBOW
- types: lemmas / lemmas + MWE + NE
- alg: hierachical SoftMax, Negative Sampling
- dim: 300

publications: --- 

2. FastText (WUST):
source: https://nextcloud.clarin-pl.eu/index.php/s/eLFGpO2t92I2jBj
training corpora: KGR10

- arch: skipgram
- types: lemmas / lemmas + MWE + NE
- alg: hierachical SoftMax
- dim: 300

publications: ---

Pre-trained models for Polish, external sources:

1. Word2Vec (IPI PAN)
source: http://dsmodels.nlp.ipipan.waw.pl/
data: NKJP / Wikipedia

- arch: skipgram / CBOW
- types: forms / lemmas
- alg: hierachical SoftMax, Negative Sampling
- dim: 100, 300

publications: Mykowiecka, A., Marciniak, M., Rychlik, P. (2017) Testing word embeddings for Polish. Cognitive Sudies / Études cognitives, 2017(17)


2. Word2Vec (Lodz)
source: http://publications.it.p.lodz.pl/2016/word_embeddings/
training data: Wikipedia

- arch: skipgram / CBOW
- types: lemmas
- alg: -
- dim: 100

publications: Rogalski, M., Szczepaniak, P. (2016) Word Embeddings for the Polish Language, ICAISC 2016: Artificial Intelligence and Soft Computing pp 126-135

3. FastText (Facebook) 
source: https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md
training corpora: Wikipedia

- arch: skipgram
- types: lemmas / lemmas + MWE + NE 
- alg: skipgram
- dim: 300

Bojanowski, P., Grave, E., Joulin, A., Mikolov, T. (2016) Enriching Word Vectors with Subword Information