this post was submitted on 10 Sep 2025
942 points (99.1% liked)

Fuck AI

4126 readers
1053 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] brucethemoose@lemmy.world 2 points 2 weeks ago* (last edited 2 weeks ago)

Pretty much. And more.

"The end."

Might be a mere 3 tokens total:

‘"The ‘ ‘end."’ ‘/n/n’

I don’t know about ClosedAI, but the Chinese models in particular (like Qwen, GLM and Deepseek) went crazy optimizing their tokenizers for English, Chinese, or code, with huge vocabs for common words/phrases and even common groupings of words + punctuation/spacing as single tokens. It makes the models more efficient, as the same text counts as far fewer tokens.

“About 1 token per word” is a decent estimate for a block of text, even including spaces and punctuation.