this post was submitted on 04 Jul 2025
185 points (98.4% liked)

Programmer Humor

24772 readers
737 users here now

Welcome to Programmer Humor!

This is a place where you can post jokes, memes, humor, etc. related to programming!

For sharing awful code theres also Programming Horror.

Rules

founded 2 years ago
MODERATORS
 
top 50 comments
sorted by: hot top controversial new old
[–] nathan@piefed.alphapuggle.dev 100 points 2 days ago (1 children)

This isn't YAML, this is just sparkling JSON

[–] ZoteTheMighty@lemmy.zip 29 points 2 days ago (2 children)

All yaml is just sparkling JSON.

[–] DataElemental@programming.dev 4 points 1 day ago (1 children)
[–] TomasEkeli@programming.dev 13 points 1 day ago (1 children)

Valid JSON is valid YAML.

So valid YAML can contain JSON

[–] DataElemental@programming.dev 0 points 1 day ago (1 children)

That's a long way from all or most YAML being JSON-compatible. I wonder if more YAML files in the wild parse as Markdown than JSON.

[–] TomasEkeli@programming.dev 7 points 1 day ago (1 children)

JSON is valid YAML does not imply that YAML is valid JSON

[–] DataElemental@programming.dev 0 points 16 hours ago* (last edited 16 hours ago)

That was my point, yes, but I meant to reply to @ZoteTheMighty@midwest.social.

[–] olafurp@lemmy.world 9 points 2 days ago

Always has been

[–] deegeese@sopuli.xyz 31 points 2 days ago (1 children)

If you’re using a library to handle deserialization , the ugliness of the serial format doesn’t matter that much.

Just call yaml.load() and forget about it.

[–] BodilessGaze@sh.itjust.works 6 points 2 days ago (1 children)

That works until you realize your calculations are all wrong due to floating point inaccuracies. YAML doesn't require any level of precision for floats, so different parsers on a document may give you different results.

[–] deegeese@sopuli.xyz 13 points 2 days ago (2 children)

What text based serialization formats do enforce numeric precision?

AFAIK it’s always left up to the writer (serializer)

[–] squaresinger@lemmy.world 1 points 10 hours ago

Technically, JSON enforces a specific numeric precision by enforcing that numbers are stored as JS-compatible floating point numbers with its associated precision.

Other than that, the best way to go if you want to have a specific precision is to cast to string before serialisation.

[–] BodilessGaze@sh.itjust.works 6 points 2 days ago* (last edited 2 days ago) (1 children)

Cuelang: https://cuelang.org/docs/reference/spec/#numeric-values

Implementation restriction: although numeric values have arbitrary precision in the language, implementations may implement them using an internal representation with limited precision. That said, every implementation must:

  • Represent integer values with at least 256 bits.
  • Represent floating-point values with a mantissa of at least 256 bits and a signed binary exponent of at least 16 bits.
  • Give an error if unable to represent an integer value precisely.
  • Give an error if unable to represent a floating-point value due to overflow.
  • Round to the nearest representable value if unable to represent a floating-point value due to limits on precision. These requirements apply to the result of any expression except for builtin functions, for which an unusual loss of precision must be explicitly documented.
[–] deegeese@sopuli.xyz 3 points 21 hours ago* (last edited 19 hours ago) (1 children)

Thanks for teaching me something, but the obscurity of your answer just illustrates how rare that requirement is in human readable formats, and mostly limited to data formats designed for numeric precision, like HDF5, FITS or protobuf.

[–] BodilessGaze@sh.itjust.works 1 points 19 hours ago* (last edited 19 hours ago)

I don't think having well-defined precision is a rare requirement, it's more that most devs don't understand (and/or care) about the pitfalls of inaccuracy, because they usually aren't obvious. Also, languages like JavaScript/PHP make it hard to do things the right way. When I was working on an old PHP codebase, I ran into a popular currency library (Zend_Currency) that used floats for handling money, which I'm sure works fine up until the point the accountants call you up asking why they can't balance the books. The "right way" was to use the bcmath extension, which was a huge pain.

[–] fibojoly@sh.itjust.works 30 points 2 days ago (4 children)

I'm amazed at developers who don't grasp that you don't need to have absolutely everything under the sun in a human readable file format. This is such a textbook case...

[–] chaospatterns@lemmy.world 13 points 1 day ago (1 children)

Yeah this isn't even human readable even when it's in YAML. What am I going to do? Read the floats and understand that the person looked left?

[–] squaresinger@lemmy.world 1 points 10 hours ago (1 children)

It's human-readable enough for debugging. You might not be able to read whether a person look left, but you can read which field is null or missing or wildly out of range. You can also read if a value is duplicated when it shouldn't be.

Human-readable is primarily about the structure and less about the data being human readable.

[–] vivendi@programming.dev 0 points 8 hours ago* (last edited 8 hours ago) (1 children)

You could also not be an idiot and write a debug script that checks those values or atleast provides an interface

But I guess they don't teach that kind of thing in the javascript and python school of dogshit programming

[–] squaresinger@lemmy.world 1 points 7 hours ago* (last edited 6 hours ago)

Who pissed in your coffee?

Sure you can write some script to interpret the data, but then you need to write an extra script that you need to run any time you step through the code, or whenever you want to look at the data when it's stored or transferred.

But I guess you have never worked on an actually big project, so how would you know?

I guess you aren't entirely wrong here. If nobody other than you ever uses your program and nobody other than you ever looks at the code, readability really doesn't matter and thus you can microoptimize everything into illegibility. But don't extrapolate from your hobby coding to actual projectes.

[–] marcos@lemmy.world 12 points 2 days ago (1 children)

Even if you want it to be human readable, you don't need to include the name into every field and use balanced separators.

Any CSV variant would be an improvement already.

[–] fibojoly@sh.itjust.works 5 points 1 day ago

Even using C#'s decimal type (128bit) would be an improvement! I count 22 characters per numbers here. So a minimum of 176bit.

[–] FuckBigTech347@lemmygrad.ml 3 points 1 day ago

Exactly. All modern CPUs are so standardized that there is little reason to store all the data in ASCII text. It's so much faster and less complicated to just keep the raw binary on disk.

[–] Dultas@lemmy.world 4 points 2 days ago

That's it everyone, back to copybooks.

[–] MonkderVierte@lemmy.zip 45 points 2 days ago (1 children)

Maybe use a real database for that? I'm a fan of simple tools (e.g. plaintext) for simple usecases but please use appropriate tools.

[–] nous@programming.dev 14 points 2 days ago (6 children)

What is wrong with a file for this? Sounds more like a local log or debug output that a single thread in a single process would be creating. A file is fine for high volume append only data like this. The only big issue is the format of that data.

What benefit would a database bring here?

[–] NeatNit@discuss.tchncs.de 25 points 2 days ago (2 children)

I think SQLite is a great middle ground. It saves the database as a single .db file, and can do everything an SQL database can do. Querying for data is a lot more flexible and a lot faster. The tools for manipulating the data in any way you want are very good and very robust.

However, I'm not sure how it would affect file size. It might be smaller because JSON/YAML wastes a lot of characters on redundant information (field names) and storing numbers as text, which the database would store as binary data in a defined structure. On the other hand, extra space is used to make common SQL operations happen much faster using fancy data structures. I don't know which effect is greater so file size could be bigger or smaller.

[–] Scrath@lemmy.dbzer0.com 8 points 2 days ago (1 children)

I didn't look to much at the data but I think csv might actually be an appropriate format for this?

Nice simple plaintext and very easy to parse into a datastructure for analysing/using it in python or similar

[–] nous@programming.dev 3 points 1 day ago

CSV would be fine. The big problem with the data as presented is it is a YAML list, so needs the whole file to be read into memory and decoded before you get and values out of it. Any line based encoding would be vastly better and allow line based processing to be done. CSV, json objects encoded into a single line, some other streaming binary format. Does not make much difference overall as long as it is line based or at least streamable.

[–] GenderNeutralBro@lemmy.sdf.org 4 points 2 days ago (1 children)

SQLite would definitely be smaller, faster, and require less memory.

Thing is, it's 2025, roughly 20 years since anybody's given half a shit about storage efficiency, memory efficiency, or even CPU efficiency for anything so small. Presumably this is not something they need to query dynamically.

[–] NeatNit@discuss.tchncs.de 3 points 2 days ago (1 children)

True (in most contexts, probably including this one), but I think that only makes the case for SQLite stronger. What people do still care about is a good flexible, usable and reliable interface. I'm not sure how to get that with YAML.

[–] nous@programming.dev 3 points 1 day ago

YAML is not a good format for this. But any line based or steamable format would be good enough for log data like this. Really easy to parse with any language or even directly with shell scripts. No need to even know SQL, any text processing would work fine.

[–] towerful@programming.dev 13 points 2 days ago (2 children)

Smaller file size, lower data rate, less computational overhead, no conversion loss.

A 64 bit float requires 64 bits to store.
ASCII representation of a 64 bit float (in the example above) is 21 characters or 168 bits.
Also, if every record is the same then there is a huge overhead for storing the name of each value. Plus the extra spaces, commas and braces.
So, you are at least doubling the file size and data throughput. And there is precision loss when converting float-string-float. Plus the computational overhead of doing those conversions.

Something like sqlite is lightweight, fast and will store the native data types.
It is widely supported, and allows for easy querying of the data.
Also makes it easy for 3rd party programs to interact with the data.

If you are ever thinking of implementing some sort of data storage in files, consider sqlite first.

load more comments (2 replies)
[–] qaz@lemmy.world 5 points 2 days ago* (last edited 2 days ago) (1 children)

It's used to export tracking data to analyze later on. Something like SQLite seems like a much better choice to me.

[–] expr@programming.dev 2 points 1 day ago (1 children)

If it's an export that will be consumed by a separate, unrelated program later, I think a CSV is most appropriate. Databases are persistence tools, not transport.

[–] qaz@lemmy.world 1 points 1 day ago (1 children)

It's only intended to be used by the program itself. It's purely storage.

[–] expr@programming.dev 2 points 1 day ago

Ah so it's not really an export, it's just the backing store used by some other (locally-running) program that you're trying to reverse engineer?

In that case yeah an sqlite database is probably most appropriate, though I can see a CSV still being desirable to remove a potential sqlite dependency.

load more comments (3 replies)
[–] slackness@lemmy.ml 29 points 2 days ago (1 children)

Fuck yaml. I'm not parsing data structured with spaces and newlines with my eyes. Use visible characters.

Does your viewer/editor not show space chars or indent levels?

[–] raman_klogius@ani.social 15 points 2 days ago* (last edited 1 day ago) (2 children)

Why you shouldn't use YAML

[–] Damage@feddit.it 21 points 2 days ago

The best approach would be to never use yaml for anything

[–] BodilessGaze@sh.itjust.works 6 points 2 days ago

YAML doesn't require any level of accuracy for floating point numbers, and that doc appears to have numbers large enough to run into problems for single-precision floats (maybe double too). That means different parsers could give you different results.

[–] lime@feddit.nu 19 points 2 days ago* (last edited 2 days ago)

i mean, json is valid yaml

[–] BestBouclettes@jlai.lu 13 points 2 days ago

I really like YAML but way too many people use it beyond its purpose... I work with Gitlabci and seeing complex bash scripts inline in YAML files makes me want to hurt people.

[–] wise_pancake@lemmy.ca 5 points 2 days ago (1 children)

I’d probably just use line delimited JSON or CSV for this use case. It plays nicely with cat and other standard tools and basically all the yaml is doing is wrapping raw json and adding extra parse time/complexity.

In the end consider converting this to parquet for analysis, you probably won’t get much from compression or row-group clustering, but you will get benefits from the column store format when reading the data.

[–] qaz@lemmy.world 5 points 2 days ago* (last edited 2 days ago) (1 children)

Thanks for the advice, but this is just the format of some eyetracking software I had to use not something I develop myself

[–] wise_pancake@lemmy.ca 5 points 2 days ago

Ah, well, such is software dependencies.

[–] disco@lemdro.id 6 points 2 days ago

This is nasty to look at

[–] Supercrunchy@programming.dev 5 points 2 days ago

Also let's represent all numbers in scientific notation, I'm sure that's going to make it easier to read...

load more comments
view more: next ›