592

submitted 4 months ago by Maven@lemmy.zip to c/programmerhumor@lemmy.ml

56 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] NeatNit@discuss.tchncs.de 3 points 4 months ago* (last edited 4 months ago)

I'm assuming Unicode anyway, and UTF-8 is by far the most natural because most files will be in ASCII. A "normal form" (see link above), you might think of it as a canonical form, is a way to check if two strings are equivalent, even if they encoded the text differently. Like the example mentioned on Wikipedia:

For example, the distinct Unicode strings "U+212B" (the angstrom sign "Å") and "U+00C5" (the Swedish letter "Å") are both expanded by NFD (or NFKD) into the sequence "U+0041 U+030A" (Latin letter "A" and combining ring above "°") which is then reduced by NFC (or NFKC) to "U+00C5" (the Swedish letter "Å").

this post was submitted on 16 Jul 2024

592 points (97.4% liked)

Programmer Humor

32464 readers

416 users here now

Post funny things about programming here! (Or just rant about your favourite programming language.)

Rules:

Posts must be relevant to programming, programmers, or computer science.
No NSFW content.
Jokes must be in good taste. No hate speech, bigotry, etc.

founded 5 years ago

MODERATORS

cat_programmer@lemmy.ml

AgreeableLandscape@lemmy.ml