Ndonga (ng) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

2-D UMAP plots

Embedding matrix plots

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizengwiki sample
original omuntu kehe oku na uuthemba wemanguluko iyomadhiladhilo neiuvo osho wo elongelokalunga; uuthemba mbuka owa kwatelela mo emanguluko iyokulundulula elon
af:duisburg ar:دويسبورغ bg:дуйсбург br:duisburg ca:duisburg cs:duisburg cy:duisburg da:duisburg de:duisburg el:ντούισμπουργκ en:duisburg eo:duisburg e
# uuthemba mbuka inau dhimbulukiwa uuna omuntu ta pewa egeelo kaali na sha neyono iyopapolitika nenge keyon'o iyaa na sha nelalakano nomakankameno gii
1000 ▁omuntu ▁kehe ▁oku ▁na ▁uuthemba ▁wemanguluko ▁iyo madhiladhilo ▁neiuvo ▁osho ▁wo ▁elongelokalunga ; ▁uuthemba ▁mbuka ▁owa ▁kwatelela ▁mo ▁emanguluko ▁iyo kul un d ulu la ▁e lo n
▁a f : duisburg ▁ar : دو ي سبور غ ▁bg : дуйсбург ▁b r : duisburg ▁ca : duisburg ▁cs : duisburg ▁cy : duisburg ▁da : duisburg ▁de : duisburg ▁el : ντ ού ισμ πο υρ γκ ▁en : duisburg ▁eo : duisburg ▁e
▁# ▁uuthemba ▁mbuka ▁ina u ▁dhi mbulu ki wa ▁uuna ▁omuntu ▁ta ▁pewa ▁egeelo ▁kaa li ▁na ▁sha ▁ne yono ▁iyo pa politika ▁nenge ▁ke yo n ' o ▁iyaa ▁na ▁sha ▁ne lalakano ▁no ma kanka men o ▁g ii
3000 ▁omuntu ▁kehe ▁oku ▁na ▁uuthemba ▁wemanguluko ▁iyomadhiladhilo ▁neiuvo ▁osho ▁wo ▁elongelokalunga ; ▁uuthemba ▁mbuka ▁owa ▁kwatelela ▁mo ▁emanguluko ▁iyokulundulula ▁e lon
▁af : duisburg ▁ar : دويسبورغ ▁bg : дуйсбург ▁br : duisburg ▁ca : duisburg ▁cs : duisburg ▁cy : duisburg ▁da : duisburg ▁de : duisburg ▁el : ντούισμπουργκ ▁en : duisburg ▁eo : duisburg ▁e
▁# ▁uuthemba ▁mbuka ▁inau ▁dhimbulukiwa ▁uuna ▁omuntu ▁ta ▁pewa ▁egeelo ▁kaali ▁na ▁sha ▁neyono ▁iyopapolitika ▁nenge ▁keyon ' o ▁iyaa ▁na ▁sha ▁nelalakano ▁nomakankameno ▁g ii