Fiji Hindi (hif) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizehifwiki sample
original kariv karmasa: 00%, 00%, aur 00% bhubhag gher rakha hai vich pahari kshetrame jilaka 00% se adhik jansankhye evn csti hai.
* swami brahmaanand * swami yogaanand * swami premaanand * swami niranjanaanand * swami shivaanand * swami shraddhaanand * swami raamakrishnaanand * s
bābil (بابل) al-karbalā' (كربلاء) an-najaf (النجف) al-anbar (الأنبار) nīnawā (نينوى) dahūk (دهوك) arbīl (أربيل) kirkuk (كركوك) as-sulaymāniyyah (السلي
1000 ▁kar iv ▁kar m asa : ▁00% , ▁00% , ▁aur ▁00% ▁b h ub ha g ▁g her ▁rak ha ▁hai ▁v ich ▁pa ha ri ▁k s he t ra me ▁j il aka ▁00% ▁se ▁ad hi k ▁jan s an k h ye ▁e v n ▁c sti ▁hai .
▁* ▁sw am i ▁b ra h ma an and ▁* ▁sw am i ▁y og a an and ▁* ▁sw am i ▁pre ma an and ▁* ▁sw am i ▁n ir an j ana an and ▁* ▁sw am i ▁s hi va an and ▁* ▁sw am i ▁sh ra d d ha an and ▁* ▁sw am i ▁ra am ak r ish na an and ▁* ▁s
▁b ā b il ▁( ب ا ب ل ) ▁al - kar b al ā ' ▁( ك ر ب ل ا ء ) ▁an - na ja f ▁( ا ل ن ج ف ) ▁al - an b ar ▁( ا ل أ ن ب ا ر ) ▁n ī na w ā ▁( ن ي ن و ى ) ▁da h ū k ▁( د ه و ك ) ▁ar b ī l ▁( أ ر ب ي ل ) ▁ki r k uk ▁( ك ر ك و ك ) ▁as - s ula y m ā n i y ya h ▁( ا ل س ل ي
3000 ▁kar iv ▁kar m asa : ▁00% , ▁00% , ▁aur ▁00% ▁bh ub ha g ▁g her ▁rak ha ▁hai ▁v ich ▁paha ri ▁kshetra me ▁j il aka ▁00% ▁se ▁ad hik ▁jan s an kh ye ▁e v n ▁c sti ▁hai .
▁* ▁swami ▁b rah ma anand ▁* ▁swami ▁yog a anand ▁* ▁swami ▁pre ma anand ▁* ▁swami ▁n ir an j ana anand ▁* ▁swami ▁shi va anand ▁* ▁swami ▁sh rad d haan and ▁* ▁swami ▁ra am ak rishna anand ▁* ▁s
▁b āb il ▁( ب ا ب ل ) ▁al - kar bal ā ' ▁( ك ر ب ل ا ء ) ▁an - na ja f ▁( ا ل ن ج ف ) ▁al - an bar ▁( ا ل أ ن ب ا ر ) ▁n ī na w ā ▁( ن ي ن و ى ) ▁da h ū k ▁( د ه و ك ) ▁ar b ī l ▁( أ ر ب ي ل ) ▁ki r k uk ▁( ك ر ك و ك ) ▁as - s ula y m ā ni y ya h ▁( ا ل س ل ي
5000 ▁kar iv ▁kar m asa : ▁00% , ▁00% , ▁aur ▁00% ▁bh ub ha g ▁g her ▁rakha ▁hai ▁v ich ▁paha ri ▁kshetra me ▁jil aka ▁00% ▁se ▁ad hik ▁jan san kh ye ▁ev n ▁c sti ▁hai .
▁* ▁swami ▁brah ma anand ▁* ▁swami ▁yog a anand ▁* ▁swami ▁pre ma anand ▁* ▁swami ▁nir an jana anand ▁* ▁swami ▁shi va anand ▁* ▁swami ▁sh rad d haan and ▁* ▁swami ▁ra amak rishna anand ▁* ▁s
▁b āb il ▁( ب ا ب ل ) ▁al - kar bal ā ' ▁( ك ر ب ل ا ء ) ▁an - na ja f ▁( ال ن ج ف ) ▁al - an bar ▁( ال أ ن ب ا ر ) ▁n ī na w ā ▁( ن ي ن و ى ) ▁da h ū k ▁( د ه و ك ) ▁ar b ī l ▁( أ ر ب ي ل ) ▁kir k uk ▁( ك ر ك و ك ) ▁as - s ula y m ā ni yya h ▁( ال س ل ي
10000 ▁kar iv ▁kar m asa : ▁00% , ▁00% , ▁aur ▁00% ▁bh ub ha g ▁g her ▁rakha ▁hai ▁v ich ▁pahari ▁kshetra me ▁jil aka ▁00% ▁se ▁adhik ▁jan san kh ye ▁ev n ▁c sti ▁hai .
▁* ▁swami ▁brahma anand ▁* ▁swami ▁yog a anand ▁* ▁swami ▁pre ma anand ▁* ▁swami ▁nir an jana anand ▁* ▁swami ▁shi va anand ▁* ▁swami ▁shrad dhaan and ▁* ▁swami ▁raamak rishna anand ▁* ▁s
▁b āb il ▁( ب اب ل ) ▁al - kar bal ā ' ▁( ك ر ب ل ا ء ) ▁an - na ja f ▁( ال ن ج ف ) ▁al - an bar ▁( ال أ ن ب ا ر ) ▁n ī na w ā ▁( ن ي ن و ى ) ▁da h ū k ▁( د ه و ك ) ▁ar b ī l ▁( أ ر ب ي ل ) ▁kir k uk ▁( ك ر ك و ك ) ▁as - s ula y mā ni yya h ▁( ال س ل ي