TechTakes

2102 readers

100 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

121

Google's Gemini 2.5 pro is out of beta. (awful.systems)

submitted 1 month ago* (last edited 1 month ago) by diz@awful.systems to c/techtakes@awful.systems

72 comments fedilink hide all child comments

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

you are viewing a single comment's thread
view the rest of the comments

[–] Kazumara@discuss.tchncs.de 71 points 1 month ago* (last edited 1 month ago) (3 children)

So the "show thinking" button is essentially just for when you want to read even more untrue text?

[–] rook@awful.systems 35 points 1 month ago (2 children)

It’s just more llm output, in the style of “imagine you can reason about the question you’ve just been asked. Explain how you might have come about your answer.” It has no resemblance to how a neural network functions, nor to the output filters the service providers use.

It’s how the ai doomers get themselves into a flap over “deceptive” models… “omg it lied about its train of thought!” because if course it didn’t lie, it just edited a stream of tokens that were statistically similar to something classified as reasoning during training.

[–] Kazumara@discuss.tchncs.de 11 points 1 month ago* (last edited 1 month ago) (1 children)

I was hoping, until seeing this post, that the reasoning text was actually related to how the answer is generated. Especially regarding features such as using external tools, generating and executing code and so on.

I get how LLMs work (roughly, didn't take too many courses in ML at Uni, and GANs were still all the rage then), that's why I specifically didn't call it lies. But the part I'm always unsure about is how much external structure is imposed on the LLM-based chat bots through traditional programming filling the gaps between rounds of token generation.

Apparently I was too optimistic :-)

[–] rook@awful.systems 11 points 1 month ago (2 children)

It is related, inasmuch as it’s all generated from the same prompt and the “answer” will be statistically likely to follow from the “reasoning” text. But it is only likely to follow, which is why you can sometimes see a lot of unrelated or incorrect guff in “reasoning” steps that’s misinterpreted as deliberate lying by ai doomers.

I will confess that I don’t know what shapes the multiple “let me just check” or correction steps you sometimes see. It might just be a response stream that is shaped like self-checking. It is also possible that the response stream is fed through a separate llm session when then pushes its own responses into the context window before the response is finished and sent back to the questioner, but that would boil down to “neural networks pattern matching on each other’s outputs and generating plausible response token streams” rather than any sort of meaningful introspection.

I would expect the actual systems used by the likes of openai to be far more full of hacks and bodges and work-arounds and let’s-pretend prompts that either you or I could imagine.

[–] diz@awful.systems 14 points 1 month ago* (last edited 1 month ago) (1 children)

misinterpreted as deliberate lying by ai doomers.

I actually disagree. I think they correctly interpret it as deliberate lying, but they misattribute the intent to the LLM rather than to the company making it (and its employees).

edit: its like you are watching a TV and ads come on you say that a very very flat demon who lives in the TV is lying, because the bargain with the demon is that you get to watch entertaining content in response to having to listen to its lies. It's fundamentally correct about lying, just not about the very flat demon.

[–] YourNetworkIsHaunted@awful.systems 9 points 1 month ago

New version of Descartes: imagine that an LLM no less hallucination-prone than unaligned, is feeding it's output directly into your perceptions...

Non cogitat, ergo non est

[–] Amoeba_Girl@awful.systems 12 points 1 month ago

Note that the train of thought thing originated from users as a prompt "hack": you'd ask the bot to "go through the task step by step, checking your work and explaining what you are doing along the way" to supposedly get better results. There's no more to it than pure LLM vomit.

(I believe it does have the potential to help somewhat, in that it's more or less equivalent to running the query several times and averaging the results, so you get an answer that's more in line with the normal distribution. Certainly nothing to do with thought.)

[–] WolfLink@sh.itjust.works -5 points 1 month ago

I think there’s an aspect of having it generate a train of thought helps it generate better answers.

[–] wizardbeard@lemmy.dbzer0.com 33 points 1 month ago

Always_has_been.jpeg

[–] Redex68@lemmy.world -4 points 1 month ago (1 children)

Depending on the task it can significantly improve the quality of the output, but it doesn't help with everything. It's more useful for stuff that has to be reasoned about in multiple iterations, not something that's a direct answer.

[–] Architeuthis@awful.systems 6 points 1 month ago

Except not really, because even if stuff that has to be reasoned about in multiple iterations was a distinct category of problems, reasoning models by all accounts hallucinate a whole bunch more.