Kabiyè (kbp) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizekbpwiki sample
original lɛlaɣ fenaɣ kɩyakʋ 00 wiye, pɩnaɣ 0000 n̄ɩŋga taa palʋla galilée florence tɛtʋ taa, itaalii ɛjaɖɛ taa. kɔlaɣ fenaɣ kɩyakʋ 0 n̄ɩŋgʋ wiye, pɩnaɣ 0000 ta
senegaalɩ wɛ afrika ajɛya kɩkpɛndɩɣ ŋgbɛyɛ nɛ tɔsʊʊ hʊʊ nɛ liidiya kɩkpɛndɩɣ ŋgbɛyɛ nɖɩ payaɣ se cedeao yɔ ɖɩ-taa. ɛjaɖɛ ɖɩnɛ ɖɩwɛ ɖɔɖɔ afrika ajɛya k
victori a walɩ ɛ-ɖɛtʊ ɛlɛ kɛ wiyaʊ pɩɣa paya-ɩ se albert saxe-cobourg-gotha tʊ pɩnaɣ 0000. pe-piyaa nakʊ nzɩ sɩkpaɣ sɩ-halaa kewiyaɣ ɖɩɣa taa peeɖe ye
1000 ▁lɛ laɣ ▁fenaɣ ▁kɩyakʋ ▁00 ▁wiye , ▁pɩnaɣ ▁0000 ▁n ̄ ɩŋga ▁taa ▁palʋla ▁g a li l é e ▁f l or en ce ▁tɛtʋ ▁taa , ▁i taa lii ▁ɛjaɖɛ ▁taa . ▁kɔ laɣ ▁fenaɣ ▁kɩyakʋ ▁0 ▁n ̄ ɩŋgʋ ▁wiye , ▁pɩnaɣ ▁0000 ▁ta
▁se ne g aa lɩ ▁wɛ ▁afrika ▁ajɛya ▁kɩ kpɛnd ɩɣ ▁ŋgbɛyɛ ▁nɛ ▁tɔsʊʊ ▁h ʊʊ ▁nɛ ▁liid iya ▁kɩ kpɛnd ɩɣ ▁ŋgbɛyɛ ▁nɖɩ ▁payaɣ ▁se ▁ce de a o ▁yɔ ▁ɖɩ - taa . ▁ɛjaɖɛ ▁ɖɩnɛ ▁ɖɩwɛ ▁ɖɔɖɔ ▁afrika ▁ajɛya ▁k
▁ vi c to ri ▁a ▁wa lɩ ▁ɛ - ɖɛ tʊ ▁ɛlɛ ▁kɛ ▁wiyaʊ ▁pɩɣa ▁paya - ɩ ▁se ▁al ber t ▁sa x e - co b our g - go t ha ▁tʊ ▁pɩnaɣ ▁0000. ▁pe - p iyaa ▁nak ʊ ▁nzɩ ▁sɩ kpaɣ ▁sɩ - h alaa ▁kewiyaɣ ▁ɖɩ ɣa ▁taa ▁peeɖe ▁ye
3000 ▁lɛlaɣ ▁fenaɣ ▁kɩyakʋ ▁00 ▁wiye , ▁pɩnaɣ ▁0000 ▁n ̄ ɩŋga ▁taa ▁palʋla ▁ga li lé e ▁f lor en ce ▁tɛtʋ ▁taa , ▁itaalii ▁ɛjaɖɛ ▁taa . ▁kɔlaɣ ▁fenaɣ ▁kɩyakʋ ▁0 ▁n ̄ ɩŋgʋ ▁wiye , ▁pɩnaɣ ▁0000 ▁ta
▁sene gaalɩ ▁wɛ ▁afrika ▁ajɛya ▁kɩkpɛndɩɣ ▁ŋgbɛyɛ ▁nɛ ▁tɔsʊʊ ▁hʊʊ ▁nɛ ▁liid iya ▁kɩkpɛndɩɣ ▁ŋgbɛyɛ ▁nɖɩ ▁payaɣ ▁se ▁cedeao ▁yɔ ▁ɖɩ - taa . ▁ɛjaɖɛ ▁ɖɩnɛ ▁ɖɩwɛ ▁ɖɔɖɔ ▁afrika ▁ajɛya ▁k
▁vic to ri ▁a ▁wa lɩ ▁ɛ - ɖɛ tʊ ▁ɛlɛ ▁kɛ ▁wiyaʊ ▁pɩɣa ▁paya - ɩ ▁se ▁albert ▁sa xe - co bourg - go tha ▁tʊ ▁pɩnaɣ ▁0000. ▁pe - p iyaa ▁nakʊ ▁nzɩ ▁sɩ kpaɣ ▁sɩ - h alaa ▁kewiyaɣ ▁ɖɩɣa ▁taa ▁peeɖe ▁ye
5000 ▁lɛlaɣ ▁fenaɣ ▁kɩyakʋ ▁00 ▁wiye , ▁pɩnaɣ ▁0000 ▁n ̄ ɩŋga ▁taa ▁palʋla ▁ga li lé e ▁flor ence ▁tɛtʋ ▁taa , ▁itaalii ▁ɛjaɖɛ ▁taa . ▁kɔlaɣ ▁fenaɣ ▁kɩyakʋ ▁0 ▁n ̄ ɩŋgʋ ▁wiye , ▁pɩnaɣ ▁0000 ▁ta
▁senegaalɩ ▁wɛ ▁afrika ▁ajɛya ▁kɩkpɛndɩɣ ▁ŋgbɛyɛ ▁nɛ ▁tɔsʊʊ ▁hʊʊ ▁nɛ ▁liidiya ▁kɩkpɛndɩɣ ▁ŋgbɛyɛ ▁nɖɩ ▁payaɣ ▁se ▁cedeao ▁yɔ ▁ɖɩ - taa . ▁ɛjaɖɛ ▁ɖɩnɛ ▁ɖɩwɛ ▁ɖɔɖɔ ▁afrika ▁ajɛya ▁k
▁vic to ri ▁a ▁wa lɩ ▁ɛ - ɖɛ tʊ ▁ɛlɛ ▁kɛ ▁wiyaʊ ▁pɩɣa ▁paya - ɩ ▁se ▁albert ▁sa xe - co bourg - go tha ▁tʊ ▁pɩnaɣ ▁0000. ▁pe - piyaa ▁nakʊ ▁nzɩ ▁sɩ kpaɣ ▁sɩ - h alaa ▁kewiyaɣ ▁ɖɩɣa ▁taa ▁peeɖe ▁ye
10000 ▁lɛlaɣ ▁fenaɣ ▁kɩyakʋ ▁00 ▁wiye , ▁pɩnaɣ ▁0000 ▁n ̄ ɩŋga ▁taa ▁palʋla ▁gali lée ▁florence ▁tɛtʋ ▁taa , ▁itaalii ▁ɛjaɖɛ ▁taa . ▁kɔlaɣ ▁fenaɣ ▁kɩyakʋ ▁0 ▁n ̄ ɩŋgʋ ▁wiye , ▁pɩnaɣ ▁0000 ▁ta
▁senegaalɩ ▁wɛ ▁afrika ▁ajɛya ▁kɩkpɛndɩɣ ▁ŋgbɛyɛ ▁nɛ ▁tɔsʊʊ ▁hʊʊ ▁nɛ ▁liidiya ▁kɩkpɛndɩɣ ▁ŋgbɛyɛ ▁nɖɩ ▁payaɣ ▁se ▁cedeao ▁yɔ ▁ɖɩ - taa . ▁ɛjaɖɛ ▁ɖɩnɛ ▁ɖɩwɛ ▁ɖɔɖɔ ▁afrika ▁ajɛya ▁k
▁vic tori ▁a ▁walɩ ▁ɛ - ɖɛtʊ ▁ɛlɛ ▁kɛ ▁wiyaʊ ▁pɩɣa ▁paya - ɩ ▁se ▁albert ▁saxe - co bourg - go tha ▁tʊ ▁pɩnaɣ ▁0000. ▁pe - piyaa ▁nakʊ ▁nzɩ ▁sɩ kpaɣ ▁sɩ - h alaa ▁kewiyaɣ ▁ɖɩɣa ▁taa ▁peeɖe ▁ye