Yoruba (yo) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
25000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizeyowiki sample
original rick james (born james ambrose johnson, jr.; february 0, 0000august 0, 0000) was an american singer, songwriter, musician and record producer, best kn
indiaunited states "for his theoretical studies of the physical processes of importance to the structure and evolution of the stars"
(c) (i) àì - + jẹun àìjẹun (ii) àì - + jẹun àìjẹun a ṣe àkíyèsí pé mọ́fíìmù ìṣẹ̀dá àì- náà ni ya máa n lọ láti fi yi ọ̀rọ̀-ìṣe sódì.
1000 ▁r ic k ▁j am es ▁( b or n ▁j am es ▁am b ro se ▁jo h n s on , ▁j r . ; ▁fe b ru ary ▁0, ▁0000 a ug ust ▁0, ▁0000) ▁was ▁an ▁americ an ▁s ing er , ▁s ong w ri ter , ▁m us ic i an ▁and ▁re c or d ▁pro du c er , ▁b est ▁k n
▁india un ited ▁st ates ▁" for ▁his ▁the ore tic al ▁st u di es ▁of ▁the ▁p h y s ical ▁pro ces s es ▁of ▁i mp ort an ce ▁to ▁the ▁st ru ct ure ▁and ▁e v ol u tion ▁of ▁the ▁st ar s "
▁( c ) ▁( i ) ▁àì ▁- ▁ + ▁jẹ un ▁àì jẹ un ▁( i i ) ▁àì ▁- ▁ + ▁jẹ un ▁àì jẹ un ▁a ▁ṣe ▁à k í yè sí ▁pé ▁mọ ́ f í ì m ù ▁ì ṣẹ ̀ dá ▁àì - ▁náà ▁ni ▁y a ▁máa ▁n  ▁lọ ▁láti ▁fi ▁yi ▁ọ ̀ rọ ̀ - ì ṣe ▁s ó dì .
3000 ▁r ick ▁james ▁( born ▁james ▁am b ro se ▁john son , ▁j r . ; ▁february ▁0, ▁0000 a ugust ▁0, ▁0000) ▁was ▁an ▁american ▁sing er , ▁s ong w ri ter , ▁music ian ▁and ▁rec ord ▁produ cer , ▁b est ▁kn
▁india un ited ▁states ▁" for ▁his ▁the ore tical ▁studi es ▁of ▁the ▁ph ys ical ▁pro cess es ▁of ▁imp ort ance ▁to ▁the ▁struct ure ▁and ▁ev olution ▁of ▁the ▁st ar s "
▁( c ) ▁( i ) ▁àì ▁- ▁+ ▁jẹ un ▁àì jẹ un ▁( ii ) ▁àì ▁- ▁+ ▁jẹ un ▁àì jẹ un ▁a ▁ṣe ▁àkíyèsí ▁pé ▁mọ ́ fí ìmù ▁ìṣẹ ̀ dá ▁àì - ▁náà ▁ni ▁ya ▁máa ▁n  ▁lọ ▁láti ▁fi ▁yi ▁ọ ̀ rọ ̀ - ì ṣe ▁só dì .
5000 ▁r ick ▁james ▁( born ▁james ▁am bro se ▁johnson , ▁jr . ; ▁february ▁0, ▁0000 a ugust ▁0, ▁0000) ▁was ▁an ▁american ▁singer , ▁song w ri ter , ▁music ian ▁and ▁record ▁produ cer , ▁best ▁kn
▁india un ited ▁states ▁" for ▁his ▁the ore tical ▁studies ▁of ▁the ▁phys ical ▁process es ▁of ▁import ance ▁to ▁the ▁structure ▁and ▁ev olution ▁of ▁the ▁st ars "
▁( c ) ▁( i ) ▁àì ▁- ▁+ ▁jẹ un ▁àì jẹ un ▁( ii ) ▁àì ▁- ▁+ ▁jẹ un ▁àì jẹ un ▁a ▁ṣe ▁àkíyèsí ▁pé ▁mọ ́ fí ìmù ▁ìṣẹ ̀ dá ▁àì - ▁náà ▁ni ▁ya ▁máa ▁n  ▁lọ ▁láti ▁fi ▁yi ▁ọ ̀ rọ ̀ - ì ṣe ▁só dì .
10000 ▁r ick ▁james ▁( born ▁james ▁am bro se ▁johnson , ▁jr . ; ▁february ▁0, ▁0000 august ▁0, ▁0000) ▁was ▁an ▁american ▁singer , ▁songwriter , ▁musician ▁and ▁record ▁produ cer , ▁best ▁kn
▁india united ▁states ▁" for ▁his ▁theoretical ▁studies ▁of ▁the ▁phys ical ▁processes ▁of ▁import ance ▁to ▁the ▁structure ▁and ▁evolution ▁of ▁the ▁stars "
▁( c ) ▁( i ) ▁àì ▁- ▁+ ▁jẹ un ▁àì jẹ un ▁( ii ) ▁àì ▁- ▁+ ▁jẹ un ▁àì jẹ un ▁a ▁ṣe ▁àkíyèsí ▁pé ▁mọ ́ fí ìmù ▁ìṣẹ ̀ dá ▁àì - ▁náà ▁ni ▁ya ▁máa ▁n  ▁lọ ▁láti ▁fi ▁yi ▁ọ ̀ rọ ̀ - ìṣe ▁sódì .
25000 ▁r ick ▁james ▁( born ▁james ▁am bro se ▁johnson , ▁jr .; ▁february ▁0, ▁0000 august ▁0, ▁0000) ▁was ▁an ▁american ▁singer , ▁songwriter , ▁musician ▁and ▁record ▁producer , ▁best ▁kn
▁india united ▁states ▁" for ▁his ▁theoretical ▁studies ▁of ▁the ▁physical ▁processes ▁of ▁importance ▁to ▁the ▁structure ▁and ▁evolution ▁of ▁the ▁stars "
▁( c ) ▁( i ) ▁àì ▁- ▁+ ▁jẹun ▁àì jẹun ▁( ii ) ▁àì ▁- ▁+ ▁jẹun ▁àì jẹun ▁a ▁ṣe ▁àkíyèsí ▁pé ▁mọ ́ fíìmù ▁ìṣẹ ̀ dá ▁àì - ▁náà ▁ni ▁ya ▁máa ▁n  ▁lọ ▁láti ▁fi ▁yi ▁ọ ̀ rọ ̀ - ìṣe ▁sódì .