Kölsch (ksh) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
10000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
25000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizekshwiki sample
original 0000 wood d kerch wärm jesperrd. sö vonge wärm a, an d kerch z vrassele, su dat am 00. september 0000 wärm ön mess jeläse wäde kutt.
for documentation messages (on mediawiki.org), we saw the initiative affecting 00 languages. the average progress in translations across all languages
en dat boch jeehd öt öm d vorbesserung va serviceleistunge. d aahaldende steijerung va serviceleistunge un va dörö qualität helpd, dä erfolch em jesch
1000 ▁0000 ▁wood ▁d ▁ker ch ▁wärm ▁jes p er r d . ▁sö ▁von ge ▁wärm ▁a , ▁an ▁d ▁ker ch ▁z ▁v ra ss ele , ▁su ▁dat ▁am ▁00. ▁se p tem ber ▁0000 ▁wärm ▁ön ▁m ess ▁jel ä se ▁wäde ▁ku tt .
▁for ▁do c um ent ation ▁m ess ag es ▁( on ▁mediawiki . org ), ▁we ▁s a w ▁the ▁in it i at ive ▁a ffe ct ing ▁00 ▁lang u ag es . ▁the ▁a ver age ▁pro g r ess ▁in ▁transl ation s ▁a c ro ss ▁all ▁lang u ag es
▁en ▁dat ▁b och ▁je e hd ▁öt ▁öm ▁d ▁vor b ess er ung ▁va ▁s er v ic ele ist unge . ▁d ▁a ah ald ende ▁ste ij er ung ▁va ▁s er v ic ele ist unge ▁un ▁va ▁dör ö ▁ qu al it ät ▁hel p d , ▁dä ▁er f ol ch ▁em ▁jesch
3000 ▁0000 ▁wood ▁d ▁kerch ▁wärm ▁jes per r d . ▁sö ▁vonge ▁wärm ▁a , ▁an ▁d ▁kerch ▁z ▁v ra ss ele , ▁su ▁dat ▁am ▁00. ▁september ▁0000 ▁wärm ▁ön ▁mess ▁jel ä se ▁wäde ▁kutt .
▁for ▁do c ument ation ▁mess ages ▁( on ▁mediawiki . org ), ▁we ▁sa w ▁the ▁in it i at ive ▁affect ing ▁00 ▁langu ages . ▁the ▁a ver age ▁pro gr ess ▁in ▁translations ▁a c ro ss ▁all ▁langu ages
▁en ▁dat ▁boch ▁je ehd ▁öt ▁öm ▁d ▁vorb ess er ung ▁va ▁ser v ic ele ist unge . ▁d ▁a ah ald ende ▁ste ijer ung ▁va ▁ser v ic ele ist unge ▁un ▁va ▁dör ö ▁qu al ität ▁help d , ▁dä ▁er fol ch ▁em ▁jesch
5000 ▁0000 ▁wood ▁d ▁kerch ▁wärm ▁jes per r d . ▁sö ▁vonge ▁wärm ▁a , ▁an ▁d ▁kerch ▁z ▁v ra ss ele , ▁su ▁dat ▁am ▁00. ▁september ▁0000 ▁wärm ▁ön ▁mess ▁jelä se ▁wäde ▁kutt .
▁for ▁do c ument ation ▁mess ages ▁( on ▁mediawiki . org ), ▁we ▁sa w ▁the ▁in it i ative ▁affect ing ▁00 ▁languages . ▁the ▁a ver age ▁pro gr ess ▁in ▁translations ▁ac ro ss ▁all ▁languages
▁en ▁dat ▁boch ▁je ehd ▁öt ▁öm ▁d ▁vorb esser ung ▁va ▁serv ic ele ist unge . ▁d ▁a ah ald ende ▁ste ijer ung ▁va ▁serv ic ele ist unge ▁un ▁va ▁dör ö ▁qu al ität ▁help d , ▁dä ▁er fol ch ▁em ▁jesch
10000 ▁0000 ▁wood ▁d ▁kerch ▁wärm ▁jes per r d . ▁sö ▁vonge ▁wärm ▁a , ▁an ▁d ▁kerch ▁z ▁vrassele , ▁su ▁dat ▁am ▁00. ▁september ▁0000 ▁wärm ▁ön ▁mess ▁jelä se ▁wäde ▁kutt .
▁for ▁doc umentation ▁messages ▁( on ▁mediawiki . org ), ▁we ▁sa w ▁the ▁in iti ative ▁affect ing ▁00 ▁languages . ▁the ▁aver age ▁pro gr ess ▁in ▁translations ▁ac ross ▁all ▁languages
▁en ▁dat ▁boch ▁jeehd ▁öt ▁öm ▁d ▁vorb esser ung ▁va ▁serv ic ele istunge . ▁d ▁a ah ald ende ▁ste ijer ung ▁va ▁serv ic ele istunge ▁un ▁va ▁dör ö ▁qu al ität ▁help d , ▁dä ▁erfol ch ▁em ▁jesch
25000 ▁0000 ▁wood ▁d ▁kerch ▁wärm ▁jes perrd . ▁sö ▁vonge ▁wärm ▁a , ▁an ▁d ▁kerch ▁z ▁vrassele , ▁su ▁dat ▁am ▁00. ▁september ▁0000 ▁wärm ▁ön ▁mess ▁jeläse ▁wäde ▁kutt .
▁for ▁documentation ▁messages ▁( on ▁mediawiki . org ), ▁we ▁saw ▁the ▁initiative ▁affect ing ▁00 ▁languages . ▁the ▁aver age ▁progress ▁in ▁translations ▁across ▁all ▁languages
▁en ▁dat ▁boch ▁jeehd ▁öt ▁öm ▁d ▁vorb esser ung ▁va ▁servicele istunge . ▁d ▁aah ald ende ▁ste ijer ung ▁va ▁servicele istunge ▁un ▁va ▁dörö ▁qualität ▁helpd , ▁dä ▁erfolch ▁em ▁jesch