this post was submitted on 13 Apr 2025
21 points (100.0% liked)
TechTakes
1787 readers
128 users here now
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
lol, yeah
"perverse incentives rule everything around me" is a big thing (observable) in "startup"[0] world because everything[1] is about speed/iteration. for example: why bother spending a few weeks working out a way to generate better training data for a niche kind of puzzle test if you can just code in "personality" and make the autoplag casinobot go "hah, I saw a puzzle almost like this just last week, let's see if the same solution works...."
i.e. when faced with a choice of hard vs quick, cynically I'll guess the latter in almost all cases. there are occasional exceptions, but none of the promptfondlers and modelfarmers are in that set imo
[0] - look, we may wish to argue about what having billions in vc funding categorizes a business as. but apparently "immature shitderpery" is still squarely "startup"
[1] - in the bayfucker playbook. I disagree.
I think they worked specifically on cheating the benchmarks, though. As well as popular puzzles like pre existing variants of the river crossing - it is a very large puzzle category, very popular, if the river crossing puzzle is not on the list I don't know what would be.
Keep in mind that they are also true believers, too - they think that if they cram enough little pieces of logical reasoning, taken from puzzles, into the AI, then they will get robot god that will actually start coming up with new shit.
I very much doubt that there's some general reasoning performance improvement that results in these older puzzle variants getting solved, while new ones that aren't particularly more difficult, fail.