Bislama (bi) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Embedding matrix plots

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizebiwiki sample
original {| border=0 align=right cellpadding=0 cellspacing=0 width=000 style="margin: 0 0 0em 0em; background: #f0f0f0; border: 0px #aaaaaa solid; border-colla
{| border=0 align=right cellpadding=0 cellspacing=0 width=000 style="margin: 0 0 0em 0em; background: #f0f0f0; border: 0px #aaaaaa solid; border-colla
jakob emanuel handmann emi bin mekem pitja ia long 0000. leonhard euler (00 epril 0000 – 00 septemba 0000) i bin wan man blong matematikis, fisikis, a
1000 ▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁ja ko b ▁em anu el ▁h and man n ▁emi ▁bin ▁mekem ▁p it ja ▁ia ▁long ▁0000. ▁le on h ard ▁e ul er ▁(00 ▁epril ▁0000 ▁– ▁00 ▁se p tem ba ▁0000) ▁i ▁bin ▁wan ▁man ▁blong ▁mat em at ikis , ▁fis ikis , ▁a
3000 ▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁ja ko b ▁em anu el ▁hand mann ▁emi ▁bin ▁mekem ▁pitja ▁ia ▁long ▁0000. ▁leon hard ▁e ul er ▁(00 ▁epril ▁0000 ▁– ▁00 ▁septemba ▁0000) ▁i ▁bin ▁wan ▁man ▁blong ▁matematikis , ▁fisikis , ▁a
5000 ▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁ja ko b ▁em anu el ▁hand mann ▁emi ▁bin ▁mekem ▁pitja ▁ia ▁long ▁0000. ▁leon hard ▁euler ▁(00 ▁epril ▁0000 ▁– ▁00 ▁septemba ▁0000) ▁i ▁bin ▁wan ▁man ▁blong ▁matematikis , ▁fisikis , ▁a