Xhosa (xh) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizexhwiki sample
original zingaphezulu kwesigidi ii-toni zee-herring ezibanjwa qho ngonyaka kumntla we-pacific, nakumntla we-atlantic, kwaye phantse zisibhozo kwezilishumi iint
atistshe nokuba ngunyaka omnye, bagqiba ekubeni atitshe. ithuba lokutitsha lazivelela kwisikolo samabanga aphezulu, ibantu ekroonstad kwiphondo lase f
pećanac waba langoku njengelungu lepalamente ngomhla assassination ka - gurmukhi peasant iqela (hss) inkokeli stjepan radić kwaye hss deputies pavle r
1000 ▁zinga phezulu ▁kwe si gidi ▁ii - t oni ▁ze e - h er ri ng ▁ezi ban j wa ▁ qh o ▁ng onyaka ▁kum nt la ▁we - pa ci fi c , ▁na kum nt la ▁we - at l anti c , ▁kwaye ▁ph ant se ▁zi si bho zo ▁kwe zil ish umi ▁iint
▁a t is tsh e ▁n okuba ▁ng un yaka ▁omnye , ▁ba gqi ba ▁eku beni ▁a ti tsh e . ▁i thu ba ▁loku ti tsha ▁la zi v elela ▁kwisi kolo ▁sa ma b anga ▁a phezulu , ▁i bantu ▁e k ro on s ta d ▁kwi ph ondo ▁la se ▁f
▁pe ćanac ▁waba ▁l ang oku ▁njeng el u ngu ▁le p al am ente ▁ngomhla ▁as sa s si na tion ▁ka ▁- ▁g ur m u khi ▁pe as ant ▁iqela ▁( h ss ) ▁in ko k eli ▁st je p an ▁ra di ć ▁kwaye ▁h ss ▁de p u ti es ▁pa v le ▁r
3000 ▁zinga phezulu ▁kwesi gidi ▁ii - t oni ▁zee - her ring ▁ezi banjwa ▁ qho ▁ngonyaka ▁kum ntla ▁we - pa ci fi c , ▁na kum ntla ▁we - at l anti c , ▁kwaye ▁phantse ▁zi si bhozo ▁kwe zil ishumi ▁iint
▁at is tshe ▁nokuba ▁ng un yaka ▁omnye , ▁ba gqiba ▁ekubeni ▁a ti tshe . ▁i thuba ▁loku ti tsha ▁la zi v elela ▁kwisi kolo ▁sama banga ▁a phezulu , ▁i bantu ▁e k ro on sta d ▁kwiphondo ▁lase ▁f
▁pećanac ▁waba ▁l ang oku ▁njeng elu ngu ▁le pal amente ▁ngomhla ▁as sa ssi na tion ▁ka ▁- ▁g ur mu khi ▁pe as ant ▁iqela ▁( h ss ) ▁in ko keli ▁st je pan ▁ra di ć ▁kwaye ▁h ss ▁de pu ties ▁pa v le ▁r
5000 ▁zinga phezulu ▁kwesi gidi ▁ii - t oni ▁zee - her ring ▁ezi banjwa ▁qho ▁ngonyaka ▁kumntla ▁we - pa ci fic , ▁nakum ntla ▁we - at lantic , ▁kwaye ▁phantse ▁zisi bhozo ▁kwe zil ishumi ▁iint
▁at is tshe ▁nokuba ▁ngun yaka ▁omnye , ▁ba gqiba ▁ekubeni ▁a ti tshe . ▁i thuba ▁loku ti tsha ▁la zi v elela ▁kwisikolo ▁sama banga ▁aphezulu , ▁i bantu ▁e kro on stad ▁kwiphondo ▁lase ▁f
▁pećanac ▁waba ▁lang oku ▁njeng elu ngu ▁le palamente ▁ngomhla ▁as sa ssi na tion ▁ka ▁- ▁g ur mu khi ▁pe as ant ▁iqela ▁( h ss ) ▁inko keli ▁st je pan ▁ra di ć ▁kwaye ▁h ss ▁de pu ties ▁pa v le ▁r
10000 ▁zinga phezulu ▁kwesi gidi ▁ii - toni ▁zee - her ring ▁ezi banjwa ▁qho ▁ngonyaka ▁kumntla ▁we - pacific , ▁nakum ntla ▁we - atlantic , ▁kwaye ▁phantse ▁zisi bhozo ▁kwe zil ishumi ▁iint
▁at is tshe ▁nokuba ▁ngun yaka ▁omnye , ▁ba gqiba ▁ekubeni ▁a ti tshe . ▁ithuba ▁loku ti tsha ▁la zi v elela ▁kwisikolo ▁sama banga ▁aphezulu , ▁i bantu ▁e kro onstad ▁kwiphondo ▁lase ▁f
▁pećanac ▁waba ▁langoku ▁njeng elungu ▁le palamente ▁ngomhla ▁as sa ssi na tion ▁ka ▁- ▁g ur mukhi ▁pe as ant ▁iqela ▁( h ss ) ▁inkokeli ▁st je pan ▁ra dić ▁kwaye ▁h ss ▁de pu ties ▁pa v le ▁r