tumblr

5246 readers

4 users here now

Welcome to /c/tumblr, a place for all your tumblr screenshots and news.

Our Rules:

Keep it civil. We're all people here. Be respectful to one another.
No sexism, racism, homophobia, transphobia or any other flavor of bigotry. I should not need to explain this one.
Must be tumblr related. This one is kind of a given.
Try not to repost anything posted within the past month. Beyond that, go for it. Not everyone is on every site all the time.
No unnecessary negativity. Just because you don't like a thing doesn't mean that you need to spend the entire comment section complaining about said thing. Just downvote and move on.

Sister Communities:

/c/TenForward@lemmy.world - Star Trek chat, memes and shitposts
/c/Memes@lemmy.world - General memes

founded 2 years ago

MODERATORS

gedaliyah@lemmy.world

324

Are you losst? (startrek.website)

submitted 2 years ago* (last edited 2 years ago) by Stamets@startrek.website to c/tumblr@lemmy.world

30 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] sarmale@lemmy.zip 8 points 2 years ago (3 children)

How many unicode characters could you add to the standard until it becomes unreliable?

[–] Kerb@discuss.tchncs.de 28 points 2 years ago* (last edited 2 years ago)

aparently unicode supports about 1.1 million characters, and we ~~currently only use 96,382 as of version 4.0~~

EDIT: i just read that unicode 4.0 is very outdated, current version is unicode 15.1 with 149,878 characters.

[–] funkless_eck@sh.itjust.works 6 points 2 years ago

I am developing a language consisting of only communicating in different versions of zip-archive bombs

[–] dual_sport_dork@lemmy.world 1 points 2 years ago (1 children)

A Unicode character can be up to 4 bytes, so 2^32 or 4,294,967,296 potential unique characters. And it'd be easy enough to adjust the standard to allow for an extra byte(s) if necessary -- it's been done before.

[–] Turun@feddit.de 4 points 2 years ago* (last edited 2 years ago)

This is incorrect. While in UTF-32 a character (actually a code point) requires 4 bytes, and in UTF-8 up to 4 bytes, the Unicode standard is limited to 17*2^16 code points. (edit: apparently because that is the limit of UTF-16. 4 Byte UTF-8 can encode 2^21 code points, but it is not technically limited to four bytes, so in total is a ble to encode 2^31 code points)

Unicode is the standard that says "the thing we call captial A is the 65th character", literally defining a mapping from numbers to concepts.
UTF-8 or UTF-32 are a way to encode a list of numbers in a more (UTF-8) or less (UTF-32) efficient way.