Samoan (sm) subword embeddings

Vocab size vocab model 25 dim 50 dim 100 dim 200 dim 300 dim
1000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
3000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
5000 vocab model txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix
txt | bin
bokeh | umap | matrix

Embedding matrix plots

Training corpus sample, encoded with different BPE vocabulary sizes

Vocab sizesmwiki sample
original image:downtownaucklandnight.jpg image:auckland-skyline.jpg image:new zealand 0 000.jpg image:auckland harbour bridge protest 00.jpg image:eastern subu
fa'aiapani. 'o iapani 'o se atunu'u 'ua fa'ateteleina lona mālosi i 'upufai o le kelope po'o le lalolagi. 'ua alu fo'i i luma lona lē fa'alogo i le fa
eleni (fa'aeleni: ελληνική δημοκρατία) se atanu'u totonu o europa. e tusa ai ma le tusigaigoa 0000, faitau aofai o tagata o le eleni o loo siomia 00.0
1000 ▁i ma ge : do wn to w nau ck land ni ght . jpg ▁i ma ge : au ck land - s k y li ne . jpg ▁i ma ge : ne w ▁ z ea land ▁0 ▁000 . jpg ▁i ma ge : au ck land ▁har b o ur ▁b ri d ge ▁p ro te st ▁00. jpg ▁i ma ge : ea s ter n ▁su b u
▁fa ' a ia pani . ▁' o ▁ia pani ▁' o ▁se ▁atunu ' u ▁' ua ▁fa ' a te tele ina ▁lona ▁mā lo si ▁i ▁' upu fai ▁o ▁le ▁k e lo pe ▁po ' o ▁le ▁lalolagi . ▁' ua ▁alu ▁fo ' i ▁i ▁luma ▁lona ▁lē ▁fa ' alo go ▁i ▁le ▁fa
▁ele ni ▁( fa ' a ele ni : ▁ ε λ λ η ν ι κ ή ▁ δ η μ ο κ ρ α τ ί α ) ▁se ▁atanu ' u ▁totonu ▁o ▁europa . ▁e ▁tusa ▁ai ▁ma ▁le ▁tusi ga i goa ▁0000, ▁faitau ▁aofai ▁o ▁tagata ▁o ▁le ▁ele ni ▁o ▁loo ▁siomia ▁00. 0
3000 ▁image : down to w nau ckland night . jpg ▁image : auckland - s ky line . jpg ▁image : new ▁zealand ▁0 ▁000 . jpg ▁image : auckland ▁har bour ▁b ridge ▁pro te st ▁00. jpg ▁image : ea s tern ▁su bu
▁fa ' aia pani . ▁' o ▁iapani ▁' o ▁se ▁atunu ' u ▁' ua ▁fa ' ate teleina ▁lona ▁mā losi ▁i ▁' upu fai ▁o ▁le ▁ke lo pe ▁po ' o ▁le ▁lalolagi . ▁' ua ▁alu ▁fo ' i ▁i ▁luma ▁lona ▁lē ▁fa ' alogo ▁i ▁le ▁fa
▁eleni ▁( fa ' a eleni : ▁ ε λ λ η ν ι κ ή ▁ δ η μ ο κ ρ α τ ί α ) ▁se ▁atanu ' u ▁totonu ▁o ▁europa . ▁e ▁tusa ▁ai ▁ma ▁le ▁tusigaigoa ▁0000, ▁faitau ▁aofai ▁o ▁tagata ▁o ▁le ▁eleni ▁o ▁loo ▁siomia ▁00.0
5000 ▁image : down to w nau ckland night . jpg ▁image : auckland - s ky line . jpg ▁image : new ▁zealand ▁0 ▁000. jpg ▁image : auckland ▁har bour ▁b ridge ▁pro test ▁00. jpg ▁image : ea stern ▁su bu
▁fa ' aiapani . ▁' o ▁iapani ▁' o ▁se ▁atunu ' u ▁' ua ▁fa ' ate teleina ▁lona ▁mālosi ▁i ▁' upu fai ▁o ▁le ▁kelope ▁po ' o ▁le ▁lalolagi . ▁' ua ▁alu ▁fo ' i ▁i ▁luma ▁lona ▁lē ▁fa ' alogo ▁i ▁le ▁fa
▁eleni ▁( fa ' aeleni : ▁ ελληνικ ή ▁δημοκρατία ) ▁se ▁atanu ' u ▁totonu ▁o ▁europa . ▁e ▁tusa ▁ai ▁ma ▁le ▁tusigaigoa ▁0000, ▁faitau ▁aofai ▁o ▁tagata ▁o ▁le ▁eleni ▁o ▁loo ▁siomia ▁00.0