Shona (sn) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
25000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizesnwiki sample
original *digi (ideo of stepping into a place quietly). **dimi (ideo of speaking figuratively). kudimika (speak figuratively - in metaphor). kana uchiti, "ndir
shoko rokuti kurungama (to be right, set right, upright, rectitude, straight, fair) rinotaura kuti chiro chakamira nemutowo wachakafanirwa kumira nach
izwi rokuti rusengedzano rinotodzana nerokuti kusenga rinoreva kutakura kuendesa kune imwe nzvimbo - to transport.
1000 ▁* di gi ▁( ide o ▁of ▁s te p p ing ▁in to ▁a ▁pla ce ▁ qu i e t ly ). ▁* * di mi ▁( ide o ▁of ▁s pe a king ▁fi gu ra ti ve ly ). ▁ku di mi ka ▁( s pe a k ▁fi gu ra ti ve ly ▁- ▁in ▁me ta p ho r ). ▁kana ▁uchi ti , ▁" n di r
▁shoko ▁rokuti ▁ku r unga ma ▁( to ▁be ▁ri ght , ▁se t ▁ri ght , ▁u p right , ▁re c ti tu de , ▁st ra i ght , ▁fa ir ) ▁rino taura ▁kuti ▁chi ro ▁chaka mira ▁nemu to wo ▁wa cha ka f ani rwa ▁ku mira ▁na ch
▁izwi ▁rokuti ▁ru s enge dz ano ▁rino to dzana ▁ne ro kuti ▁ku s enga ▁rinoreva ▁kuta kura ▁ku ende sa ▁kune ▁imwe ▁nzvimbo ▁- ▁to ▁t ran s p or t .
3000 ▁* di gi ▁( ideo ▁of ▁ste pp ing ▁into ▁a ▁place ▁qui et ly ). ▁** di mi ▁( ideo ▁of ▁spe a king ▁fi gura tive ly ). ▁kudi mi ka ▁( spe ak ▁fi gura tive ly ▁- ▁in ▁meta p ho r ). ▁kana ▁uchi ti , ▁" n di r
▁shoko ▁rokuti ▁kur unga ma ▁( to ▁be ▁right , ▁se t ▁right , ▁up right , ▁re c titude , ▁st ra ight , ▁fa ir ) ▁rinotaura ▁kuti ▁chiro ▁chaka mira ▁nemu towo ▁wa cha ka fanirwa ▁ku mira ▁na ch
▁izwi ▁rokuti ▁ru s enge dz ano ▁rino todzana ▁ne ro kuti ▁kus enga ▁rinoreva ▁kutakura ▁ku ende sa ▁kune ▁imwe ▁nzvimbo ▁- ▁to ▁t ran sp ort .
5000 ▁* di gi ▁( ideo ▁of ▁ste pping ▁into ▁a ▁place ▁qui et ly ). ▁** dimi ▁( ideo ▁of ▁speaking ▁fi gura tive ly ). ▁kudi mi ka ▁( spe ak ▁fi gura tive ly ▁- ▁in ▁meta p ho r ). ▁kana ▁uchi ti , ▁" n di r
▁shoko ▁rokuti ▁kur unga ma ▁( to ▁be ▁right , ▁set ▁right , ▁up right , ▁re c titude , ▁stra ight , ▁fa ir ) ▁rinotaura ▁kuti ▁chiro ▁chaka mira ▁nemu towo ▁wa cha ka fanirwa ▁kumira ▁na ch
▁izwi ▁rokuti ▁ru s enge dz ano ▁rinotodzana ▁ne ro kuti ▁kus enga ▁rinoreva ▁kutakura ▁ku ende sa ▁kune ▁imwe ▁nzvimbo ▁- ▁to ▁t ran sp ort .
10000 ▁* di gi ▁( ideo ▁of ▁ste pping ▁into ▁a ▁place ▁quiet ly ). ▁** dimi ▁( ideo ▁of ▁speaking ▁fi gura tively ). ▁kudi mika ▁( spe ak ▁fi gura tively ▁- ▁in ▁meta p hor ). ▁kana ▁uchi ti , ▁" ndi r
▁shoko ▁rokuti ▁kur unga ma ▁( to ▁be ▁right , ▁set ▁right , ▁up right , ▁re c titude , ▁straight , ▁fa ir ) ▁rinotaura ▁kuti ▁chiro ▁chaka mira ▁nemutowo ▁wa chaka fanirwa ▁kumira ▁na ch
▁izwi ▁rokuti ▁rus enge dzano ▁rinotodzana ▁nero kuti ▁kus enga ▁rinoreva ▁kutakura ▁ku endesa ▁kune ▁imwe ▁nzvimbo ▁- ▁to ▁tran sp ort .
25000 ▁* di gi ▁( ideo ▁of ▁ste pping ▁into ▁a ▁place ▁quietly ). ▁** dimi ▁( ideo ▁of ▁speaking ▁figuratively ). ▁kudi mika ▁( speak ▁figuratively ▁- ▁in ▁meta phor ). ▁kana ▁uchiti , ▁" ndi r
▁shoko ▁rokuti ▁kurunga ma ▁( to ▁be ▁right , ▁set ▁right , ▁upright , ▁re c titude , ▁straight , ▁fa ir ) ▁rinotaura ▁kuti ▁chiro ▁chakamira ▁nemutowo ▁wa chaka fanirwa ▁kumira ▁na ch
▁izwi ▁rokuti ▁rus engedzano ▁rinotodzana ▁nerokuti ▁kus enga ▁rinoreva ▁kutakura ▁ku endesa ▁kune ▁imwe ▁nzvimbo ▁- ▁to ▁transport .