Swati (ss) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Embedding matrix plots

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizesswiki sample
original * iriphabliki. * inhlokodolobha: monrovia. * linani lebantfu emonrovia: 0.000.000 (0000). * mengameli: ellen johnson sirleaf. * 000,000 km0. * linani
inkantolo yemtsetfo sisekelo yase ningizimu afrika iyinkantolo leyasungulwa ngumtsetfo sisekelo wase ningizimu afrika. lena bekuyinkantolo yekugcina y
kusebentisa indlela yekulawula kubeleka emaveni lasatfutfuka yehlise inombolo nekushona ngalesikhatsi ukhulelwe ngemaphesenti langu 00% (kuvikelwa kuf
1000 ▁* ▁iriphabliki . ▁* ▁inhlokodolobha : ▁m on r ov i a . ▁* ▁linani ▁lebantfu ▁em on r ov i a : ▁0.000.000 ▁(0000). ▁* ▁mengameli : ▁ ell en ▁j o h n s on ▁si r le af . ▁* ▁000 ,000 ▁km 0. ▁* ▁linani
▁in k ant olo ▁yem tse tfo ▁sis ek elo ▁ya se ▁n ingizimu ▁afrika ▁i yin k ant olo ▁le y as ung ul wa ▁ngum tse tfo ▁sis ek elo ▁wa se ▁n ingizimu ▁afrika . ▁len a ▁b eku yin k ant olo ▁yeku gc ina ▁y
▁ku sebenti sa ▁in dlela ▁y ek ula w ula ▁kub eleka ▁ema veni ▁l asa tfu tfu ka ▁ye hl i se ▁in om b olo ▁neku sh ona ▁ngal esikhatsi ▁u khul el we ▁ngem a ph es enti ▁lang u ▁00 % ▁( k u v ik elwa ▁ku f
3000 ▁* ▁iriphabliki . ▁* ▁inhlokodolobha : ▁mon r ovi a . ▁* ▁linani ▁lebantfu ▁em on r ovi a : ▁0.000.000 ▁(0000). ▁* ▁mengameli : ▁ ell en ▁john s on ▁si r le af . ▁* ▁000,000 ▁km 0. ▁* ▁linani
▁in kantolo ▁yem tsetfo ▁sisekelo ▁yase ▁ningizimu ▁afrika ▁i yin kantolo ▁ley as ungulwa ▁ngum tsetfo ▁sisekelo ▁wase ▁ningizimu ▁afrika . ▁lena ▁beku yin kantolo ▁yeku gcina ▁y
▁ku sebentisa ▁indlela ▁y ekula wula ▁kubeleka ▁emaveni ▁l asa tfutfu ka ▁ye hl ise ▁in ombolo ▁neku shona ▁ngalesikhatsi ▁u khulelwe ▁ngema phesenti ▁langu ▁00% ▁( ku vik elwa ▁kuf
5000 ▁* ▁iriphabliki . ▁* ▁inhlokodolobha : ▁mon r ovi a . ▁* ▁linani ▁lebantfu ▁em on r ovi a : ▁0.000.000 ▁(0000). ▁* ▁mengameli : ▁ ell en ▁john son ▁si r le af . ▁* ▁000,000 ▁km 0. ▁* ▁linani
▁inkantolo ▁yem tsetfo ▁sisekelo ▁yase ▁ningizimu ▁afrika ▁i yin kantolo ▁ley as ungulwa ▁ngum tsetfo ▁sisekelo ▁wase ▁ningizimu ▁afrika . ▁lena ▁beku yin kantolo ▁yeku gcina ▁y
▁kusebentisa ▁indlela ▁y ekulawula ▁kubeleka ▁emaveni ▁lasa tfutfuka ▁ye hl ise ▁inombolo ▁neku shona ▁ngalesikhatsi ▁u khulelwe ▁ngema phesenti ▁langu ▁00% ▁( ku vik elwa ▁kuf