Malagasy (mg) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
25000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizemgwiki sample
original * ity lahatsoratra manontolo ity dia nodikaina tamin'ny teny malagasy. azonao jerena eto ny pejy loharanon'ity lahatsoratra ity :
* ity lahatsoratra manontolo ity dia nodikaina tamin'ny teny malagasy. azonao jerena eto ny pejy loharanon'ity lahatsoratra ity :
{| border=0 align=right cellpadding=0 cellspacing=0 width=000 style="margin: 0 0 0em 0em; background: #f0f0f0; border: 0px #aaaaaa solid; border-colla
1000 ▁* ▁ity ▁lahatsoratra ▁man ont olo ▁ity ▁dia ▁no d ika ina ▁tamin ' ny ▁teny ▁malagasy . ▁a z ona o ▁j er ena ▁e to ▁ny ▁p e jy ▁loharan on ' ity ▁lahatsoratra ▁ity ▁:
▁* ▁ity ▁lahatsoratra ▁man ont olo ▁ity ▁dia ▁no d ika ina ▁tamin ' ny ▁teny ▁malagasy . ▁a z ona o ▁j er ena ▁e to ▁ny ▁p e jy ▁loharan on ' ity ▁lahatsoratra ▁ity ▁:
▁ { | ▁bor d er = 0 ▁al ig n = ri g h t ▁c ell pa d din g = 0 ▁c ell s pa c ing = 0 ▁w id th = 000 ▁s ty le = " mar g in : ▁0 ▁0 ▁0 e m ▁0 e m ; ▁ba ck g r ou n d : ▁ # f 0 f 0 f 0 ; ▁bor d er : ▁0 p x ▁ # a a a a a a ▁so l id ; ▁bor d er - c ol la
3000 ▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁ { | ▁bor der = 0 ▁al ign = ri gh t ▁c ell pa d din g = 0 ▁c ell s pa c ing = 0 ▁w id th = 000 ▁s ty le = " mar gin : ▁0 ▁0 ▁0 em ▁0 em ; ▁ba ck g rou nd : ▁# f 0 f 0 f 0 ; ▁bor der : ▁0 px ▁# a a a a a a ▁sol id ; ▁bor der - col la
5000 ▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁ { | ▁border = 0 ▁al ign = right ▁cell pa d ding = 0 ▁cell s pa c ing = 0 ▁w id th = 000 ▁s ty le = " mar gin : ▁0 ▁0 ▁0 em ▁0 em ; ▁ba ck g rou nd : ▁# f 0 f 0 f 0 ; ▁border : ▁0 px ▁# aa aa aa ▁sol id ; ▁border - col la
10000 ▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁ { | ▁border = 0 ▁align = right ▁cellpadding = 0 ▁cellspacing = 0 ▁width = 000 ▁style = " margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la
25000 ▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁* ▁ity ▁lahatsoratra ▁manontolo ▁ity ▁dia ▁nodikaina ▁tamin ' ny ▁teny ▁malagasy . ▁azonao ▁jerena ▁eto ▁ny ▁pejy ▁loharanon ' ity ▁lahatsoratra ▁ity ▁:
▁ { | ▁border = 0 ▁align = right ▁cellpadding = 0 ▁cellspacing = 0 ▁width = 000 ▁style = " margin : ▁0 ▁0 ▁0 em ▁0 em ; ▁background : ▁# f 0 f 0 f 0; ▁border : ▁0 px ▁# aaaaaa ▁solid ; ▁border - col la