this post was submitted on 11 Jan 2025
221 points (97.8% liked)

Data is Beautiful

1841 readers
55 users here now

Be respectful

founded 10 months ago
MODERATORS
 

Cross posted from: Latin@lemm.ee

lingua latina pater linguarum dimidum est ๐Ÿ˜Ž

I hope it's okay for me to crosspost here.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] Hackworth@lemmy.world 5 points 3 months ago (2 children)

I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.

[โ€“] gandalf_der_12te@discuss.tchncs.de 3 points 3 months ago* (last edited 3 months ago)

i know that my NN internally uses semantic tokenization method.

i literally often seek the word roots when talking to somebody. it helps me focus.