Fijian (fj) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Embedding matrix plots

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizefjwiki sample
original {| border=0 align=right cellpadding=0 cellspacing=0 width=000 style="margin: 0 0 0em 0em; background: #f0f0f0; border: 0px #aaaaaa solid; border-colla
imagesize = width:000 height:000 plotarea = left:00 right:00 top:00 bottom:00 timeaxis = orientation:vertical alignbars = late colors =
# e na i vakatekivu e a sa bula tu kina na vosa, ka rau a tiko vata kei na kalou na vosa, ka sa kalou na vosa. # e rau a tiko vata sara ga na vosa kei
1000 ▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - c ol la
▁image size ▁= ▁width :000 ▁hei ght :000 ▁plo tarea ▁= ▁left :00 ▁right :00 ▁top :00 ▁bot tom :00 ▁ti meaxis ▁= ▁orienta tion : ver tical ▁align bars ▁= ▁late ▁c olors ▁=
▁# ▁e ▁na ▁i ▁vaka tekivu ▁e ▁a ▁sa ▁bula ▁tu ▁kina ▁na ▁vosa , ▁ka ▁rau ▁a ▁tiko ▁vata ▁kei ▁na ▁kalou ▁na ▁vosa , ▁ka ▁sa ▁kalou ▁na ▁vosa . ▁# ▁e ▁rau ▁a ▁tiko ▁vata ▁sara ▁ga ▁na ▁vosa ▁kei
3000 ▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁imagesize ▁= ▁width :000 ▁height :000 ▁plotarea ▁= ▁left :00 ▁right :00 ▁top :00 ▁bottom :00 ▁timeaxis ▁= ▁orientation : vertical ▁alignbars ▁= ▁late ▁colors ▁=
▁# ▁e ▁na ▁i ▁vakatekivu ▁e ▁a ▁sa ▁bula ▁tu ▁kina ▁na ▁vosa , ▁ka ▁rau ▁a ▁tiko ▁vata ▁kei ▁na ▁kalou ▁na ▁vosa , ▁ka ▁sa ▁kalou ▁na ▁vosa . ▁# ▁e ▁rau ▁a ▁tiko ▁vata ▁sara ▁ga ▁na ▁vosa ▁kei
5000 ▁{| ▁border =0 ▁align = right ▁cellpadding =0 ▁cellspacing =0 ▁width =000 ▁style =" margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
▁imagesize ▁= ▁width :000 ▁height :000 ▁plotarea ▁= ▁left :00 ▁right :00 ▁top :00 ▁bottom :00 ▁timeaxis ▁= ▁orientation : vertical ▁alignbars ▁= ▁late ▁colors ▁=
▁# ▁e ▁na ▁i ▁vakatekivu ▁e ▁a ▁sa ▁bula ▁tu ▁kina ▁na ▁vosa , ▁ka ▁rau ▁a ▁tiko ▁vata ▁kei ▁na ▁kalou ▁na ▁vosa , ▁ka ▁sa ▁kalou ▁na ▁vosa . ▁# ▁e ▁rau ▁a ▁tiko ▁vata ▁sara ▁ga ▁na ▁vosa ▁kei