TechTakes

2062 readers

79 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

Gemini seem to have "solved" my duck river crossing, lol. (awful.systems)

submitted 3 months ago by diz@awful.systems to c/techtakes@awful.systems

19 comments fedilink hide all child comments

Tried my duck river crossing thing a few times recently, it usually solves it now, albeit with a bias to make unnecessary trips half of the time.

Of course, anything new fails:

There's 2 people and 1 boat on the left side of the river, and 3 boats on the right side of the river. Each boat can accommodate up to 6 people. How do they get all the boats to the left side of the river?

Did they seriously change something just to deal with my duck puzzle? How odd.

It's Google so it is not out of the question that they might do some analysis on the share links and referring pages, or even use their search engine to find discussions of a problem they're asked. I need to test that theory and simultaneously feed some garbage to their plagiarism machine...

Sample of the new botshit:

L->R: 2P take B_L. L{}, R{2P, 4B}. R->L: P1 takes B_R1. L{P1, B_R1}, R{P2, 3B}. R->L: P2 takes B_R2. L{2P, B_R1, B_R2}, R{2B}. L->R: P1 takes B_R1 back. L{P2, B_R2}, R{P1, 3B}. R->L: P1 takes B_R3. L{P1, P2, B_R2, B_R3}, R{2B}. L->R: P2 takes B_R2 back. L{P1, B_R3}, R{P2, 3B}.

And again and again, like a buggy attempt at brute forcing the problem.

you are viewing a single comment's thread
view the rest of the comments

[–] YourNetworkIsHaunted@awful.systems 8 points 3 months ago (8 children)

The fact that it appears to be trying to create a symbolic representation of the problem is interesting, since that's the closest I've ever seen this come to actually trying to model something rather than just spewing raw text, but the model itself looks nonsensical, especially for such a simple problem.

Did you use any of that kind of notation in the prompt? Or did some poor squadron of task workers write out a few thousand examples of this notation for river crossing problems in an attempt to give it an internal structure?

[–] diz@awful.systems 7 points 3 months ago* (last edited 3 months ago) (1 children)

Did you use any of that kind of notation in the prompt? Or did some poor squadron of task workers write out a few thousand examples of this notation for river crossing problems in an attempt to give it an internal structure?

I didn't use any notation in the prompt, but gemini 2.5 pro seem to always represent state of the problem after every step in some way. When asked if it does anything with it says it is "very important", so it may be that there's some huge invisible prompt that says its very important to do this.

It also mentioned N cannibals and M missionaries.

My theory is that they wrote a bunch of little scripts that generate puzzles and solutions in that format. Since river crossing is one of the top most popular puzzles, it would be on the list (and N cannibals M missionaries is easy to generate variants of), although their main focus would have been the puzzles in the benchmarks that they are trying to cheat.

edit: here's one of the logs:

https://pastebin.com/GKy8BTYD

Basically it keeps on trying to brute force the problem. It gets first 2 moves correct, but in a stopped clock style manner - if there's 2 people and 1 boat they both take the boat, if there's 2 people and >=2 boats, then each of them takes a boat.

It keeps doing the same shit until eventually its state tracking fails, or its reading of the state fails, and then it outputs the failure as a solution. Sometimes it deems it impossible:

https://pastebin.com/Li9quqqd

All tests done with gemini 2.5 pro, I can post links if you need them but links don't include their "thinking" log and I also suspect that if >N people come through a link they just look at it. Nobody really shares botshit unless its funny or stupid. A lot of people independently asking the same problem, that would often happen if there's a new homework question so they can't use that as a signal so easily.

[–] YourNetworkIsHaunted@awful.systems 1 points 3 months ago

I'm not familiar with the cannibal/missionary framed puzzle, but reading through it the increasingly simplified notation reads almost like a comp sci textbook trying to find or outline an algorithm for something, but with an incredibly simple problem. We also see it once again explicitly acknowledge then implicitly discard part of the problem; in this case it opens by acknowledging that each boat can carry up to 6 people and that each boat needs at least one person, but somehow gets stuck on the pattern that we need to alternate trips left and right and each trip can only consist of one boat. It's still pattern matching rather than reasoning, even if the matching gets more sophisticated.

load more comments (6 replies)