Oromo (om) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizeomwiki sample
original kuni fakki planeta ballina hanga neptune ta’u qabdu kan urjii gliese 000 jedhamtu kan marsittu agarsisa
barri gadaa biifolee bara mootiin habashaa lola warra islaamaa waliin qabu moo'atee humna isaa mara gara oromoo qofaatti xiyyeefate ture. mootichi hab
indaakteriin indaaktaansii isaatin ibsama, kunis reeshiyoo voolteejii fi saffisa jijjiirama karantiiti, safartuun isaas henriis (h) dha. indaakteroonn
1000 ▁kuni ▁fakk i ▁pl an e ta ▁ball ina ▁hanga ▁n e p t un e ▁ta ’ u ▁qab du ▁kan ▁ ur jii ▁g l i es e ▁000 ▁jedh am tu ▁kan ▁mar s it tu ▁agar s isa
▁bar ri ▁gadaa ▁b iif olee ▁bara ▁moot iin ▁h aba sh aa ▁lol a ▁warra ▁isl aam aa ▁waliin ▁qabu ▁moo ' atee ▁humna ▁isaa ▁mar a ▁gara ▁oromoo ▁qo f aatti ▁x iyy eef ate ▁ture . ▁moo ti chi ▁h ab
▁in daa k ter iin ▁in daa k t aan sii ▁isaa tin ▁ib s ama , ▁kunis ▁r ee sh iy oo ▁v ool tee jii ▁fi ▁sa ff isa ▁jijjiir ama ▁kar an tii ti , ▁s af ar t uun ▁is aas ▁h en rii s ▁( h ) ▁dha . ▁in daa k ter oon n
3000 ▁kuni ▁fakk i ▁pl ane ta ▁ballina ▁hanga ▁ne pt une ▁ta ’ u ▁qabdu ▁kan ▁ur jii ▁g l ies e ▁000 ▁jedhamtu ▁kan ▁mar s it tu ▁agar s isa
▁bar ri ▁gadaa ▁b iif olee ▁bara ▁moot iin ▁haba shaa ▁lola ▁warra ▁islaamaa ▁waliin ▁qabu ▁moo ' atee ▁humna ▁isaa ▁mar a ▁gara ▁oromoo ▁qof aatti ▁xiyy eef ate ▁ture . ▁mooti chi ▁h ab
▁indaak ter iin ▁indaak taan sii ▁isaa tin ▁ibs ama , ▁kunis ▁ree sh iyoo ▁voolteejii ▁fi ▁saffisa ▁jijjiir ama ▁karantii ti , ▁saf ar tuun ▁isaas ▁h en rii s ▁( h ) ▁dha . ▁indaak ter oon n
5000 ▁kuni ▁fakki ▁plane ta ▁ballina ▁hanga ▁ne pt une ▁ta ’ u ▁qabdu ▁kan ▁urjii ▁g l ies e ▁000 ▁jedhamtu ▁kan ▁mar s it tu ▁agar s isa
▁barri ▁gadaa ▁b iif olee ▁bara ▁moot iin ▁habashaa ▁lola ▁warra ▁islaamaa ▁waliin ▁qabu ▁moo ' atee ▁humna ▁isaa ▁mar a ▁gara ▁oromoo ▁qof aatti ▁xiyy eef ate ▁ture . ▁mooti chi ▁hab
▁indaak ter iin ▁indaak taan sii ▁isaatin ▁ibs ama , ▁kunis ▁ree sh iyoo ▁voolteejii ▁fi ▁saffisa ▁jijjiirama ▁karantii ti , ▁saf ar tuun ▁isaas ▁hen rii s ▁( h ) ▁dha . ▁indaak ter oon n
10000 ▁kuni ▁fakki ▁planeta ▁ballina ▁hanga ▁ne pt une ▁ta ’ u ▁qabdu ▁kan ▁urjii ▁gl ies e ▁000 ▁jedhamtu ▁kan ▁mars it tu ▁agar s isa
▁barri ▁gadaa ▁biif olee ▁bara ▁moot iin ▁habashaa ▁lola ▁warra ▁islaamaa ▁waliin ▁qabu ▁moo ' atee ▁humna ▁isaa ▁mara ▁gara ▁oromoo ▁qof aatti ▁xiyy eef ate ▁ture . ▁mootichi ▁hab
▁indaak teriin ▁indaak taan sii ▁isaatin ▁ibs ama , ▁kunis ▁ree sh iyoo ▁voolteejii ▁fi ▁saffisa ▁jijjiirama ▁karantii ti , ▁safar tuun ▁isaas ▁henrii s ▁( h ) ▁dha . ▁indaak ter oon n