this post was submitted on 11 Jan 2025
221 points (97.8% liked)
Data is Beautiful
1841 readers
55 users here now
Be respectful
founded 10 months ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I wonder if something like the semantic tokenization method would benefit from using etymological data like this, particularly for a multilingual llm.
i know that my NN internally uses semantic tokenization method.
i literally often seek the word roots when talking to somebody. it helps me focus.