Arc vs String, is Arc really faster? (blocklisted.github.io)

submitted 10 months ago by snaggen@programming.dev to c/rust@programming.dev

10 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] sugar_in_your_tea@sh.itjust.works 14 points 10 months ago* (last edited 10 months ago)

strings as ids

The main use case is "uuid," but those can be represented as 128-bit integers instead of character sequences if cloning is a concern. This emphasizes your point, usually there's a more obvious, good enough solution to performance issues than moving a string to an Arc.

It is a neat trick, and something I love reading about, but I would very much expect a lengthy comment in the code if someone does this in practice because it is not obvious at all.

[-] crispy_kilt@feddit.de 3 points 10 months ago* (last edited 10 months ago)

The main use case is “uuid,” but those can be represented as 128-bit integers

Oh man. I've seen so much software that treats UUIDs as strings internally. I've also seen things like IPv4 addresses being used in dotted notation as strings, and then the developers asking themselves why calculating "is this addr in this subnet" is so complicated.

I blame it on many people only learning high level scripting languages.

Also, you might be interested in ULID: https://github.com/ulid/spec

[-] sugar_in_your_tea@sh.itjust.works 2 points 10 months ago* (last edited 10 months ago)

IPv4 addresses

Yes! This bothers me as well!

The last team I worked with did a lot with addresses and we used Go, and way too many of my coworkers were confused when the standard library used a 4-byte array to represent IP addresses instead of a string. I've even had to read IP addresses in hex (e.g. in raw packet dumps), so I'm used to counting octet offsets.

ULID

Ew, that violates much of the point of an ID, which should be entirely opaque. One of the best parts about a UUID is that it can be exposed to the user without fear of providing any data to an attacker (we use UUIDv4 for that reason).

But maybe it's useful for distributed systems where you trust the machines providing the IDs to have accurate timestamps, but then why not just use a tuple of a random ID and a timestamp? That way you can pick if you want lexicographical sorting or random distribution by swapping the parts of the tuple, and you can use standard 64-bit timestamps (and why milliseconds? Every time system I've used uses nanoseconds or seconds, only JS does the silly ms thing; maybe Windows does?). I guess compatibility with UUID is useful kinda, but I honestly don't see a ton of value here (saves you a column in the DB and an import I guess).

So to me, this seems like it's going to be misused a ton. I'm really scratching my head over using base32, because that's only useful for preventing transcription errors, which means it's intended to be seen and used.

I hope I'm missing something because this seems like an obvious "don't do this" situation.

[-] crispy_kilt@feddit.de 1 points 10 months ago

Great points about ULID, to be honest I haven't looked into it in detail, I just saw it mentioned somewhere and glanced over their self-description. The main reason I wouldn't use it is because Postgres already has great built-in support for UUIDs

this post was submitted on 01 Jan 2024

32 points (100.0% liked)

Rust

6009 readers

3 users here now

Welcome to the Rust community! This is a place to discuss about the Rust programming language.

Wormhole

!performance@programming.dev

Credits

The icon is a modified version of the official rust logo (changing the colors to a gradient and black background)

founded 1 year ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

EdTheLegendary@programming.dev

kahnclusions@programming.dev

torcherist@programming.dev