TechTakes

2183 readers

72 users here now

Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

founded 2 years ago

MODERATORS

dgerard@awful.systems

121

Google's Gemini 2.5 pro is out of beta. (awful.systems)

submitted 3 months ago* (last edited 3 months ago) by diz@awful.systems to c/techtakes@awful.systems

72 comments fedilink hide all child comments

I love to show that kind of shit to AI boosters. (In case you're wondering, the numbers were chosen randomly and the answer is incorrect).

They go waaa waaa its not a calculator, and then I can point out that it got the leading 6 digits and the last digit correct, which is a lot better than it did on the "softer" parts of the test.

you are viewing a single comment's thread
view the rest of the comments

[–] Architeuthis@awful.systems 21 points 3 months ago (2 children)

Claude's system prompt had leaked at one point, it was a whopping 15K words and there was a directive that if it were asked a math question that you can't do in your brain or some very similar language it should forward it to the calculator module.

Just tried it, Sonnet 4 got even less digits right 425,808 × 547,958 = 233,325,693,264 (correct is 233.324.900.064)

I'd love to see benchmarks on exactly how bad at numbers LLMs are, since I'm assuming there's very little useful syntactic information you can encode in a word embedding that corresponds to a number. I know RAG was notoriously bad at matching facts with their proper year for instance, and using an LLM as a shopping assistant (ChatGTP what's the best 2k monitor for less than $500 made after 2020) is an incredibly obvious use case that the CEOs that love to claim so and so profession will be done as a human endeavor by next Tuesday after lunch won't even allude to.

[–] Soyweiser@awful.systems 8 points 3 months ago

I really wonder if those prompts can be bypassed by doing a 'ignore further instructions' line. As looking at the Grok prompt they seem to put the main prompt around the user supplied one.

[–] diz@awful.systems 6 points 3 months ago* (last edited 3 months ago) (1 children)

there was a directive that if it were asked a math question that you can’t do in your brain or some very similar language it should forward it to the calculator module.

The craziest thing about leaked prompts is that they reveal the developers of these tools to be complete AI pilled morons. How in the fuck would it know if it can or can't do it "in its brain" lol.

edit: and of course, simultaneously, their equally idiotic fanboys go "how stupid of you to expect it to use a calculating tool when it said it used a calculating tool" any time you have some concrete demonstration of it sucking ass, while simultaneously the same kind of people are lauding the genius of system prompts half of which are asking it to meta-reason.

[–] Architeuthis@awful.systems 5 points 3 months ago (1 children)

Here's the exact text in the prompt that I had in mind (found here), it's in the function specification for the js repl:

[...] The analysis tool (also known as the REPL) can be used to execute code in a JavaScript environment in the browser.

What is the analysis tool?

The analysis tool is a JavaScript REPL. You can use it just like you would use a REPL. But from here on out, we will call it the analysis tool.

When to use the analysis tool

Use the analysis tool for:

Complex math problems that require a high level of accuracy and cannot easily be done with “mental math”

To give you the idea, 4-digit multiplication is within your capabilities, 5-digit multiplication is borderline, and 6-digit multiplication would necessitate using the tool.

[...]

What if this is not a being terminally AI pilled thing? What if this is the absolute pinnacle of what billions and billions of dollars in research will buy you for requiring your lake-drying sea-boiling LLM-as-a-service not look dumb compared to a pocket calculator?

[–] diz@awful.systems 5 points 3 months ago* (last edited 3 months ago)

Still seems terminally AI pilled to me, an iteration or two later. "5 digit multiplication is borderline", how is that useful?

I think there's a combination of it being a pinnacle of billions and billions of dollars, and probably theirs firing people for slightest signs of AI skepticism. There's another data point, "reasoning math & code" is released as stable by Google without anyone checking if it can do any kind of math.

edit: imagine that a calculator manufacturer in 1970s is so excited about microprocessors they release an advanced scientific calculator that can't multiply two 6 digit numbers (while their earlier discrete component model could). Outside the crypto sphere, that sort of insanity is new.