this post was submitted on 25 Aug 2025
169 points (98.3% liked)

Fuck AI

3818 readers
828 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago
MODERATORS
 

cross-posted from: https://programming.dev/post/36289727

Comments

Our findings reveal a robustness gap for LLMs in medical reasoning, demonstrating that evaluating these systems requires looking beyond standard accuracy metrics to assess their true reasoning capabilities.6 When forced to reason beyond familiar answer patterns, all models demonstrate declines in accuracy, challenging claims of artificial intelligence’s readiness for autonomous clinical deployment.

A system dropping from 80% to 42% accuracy when confronted with a pattern disruption would be unreliable in clinical settings, where novel presentations are common. The results suggest that these systems are more brittle than their benchmark scores suggest.

top 27 comments
sorted by: hot top controversial new old
[–] zeropointone@lemmy.world 75 points 23 hours ago (5 children)

I wished people would stop misusing LLMs for anything related to intelligence, skills and knowledge. This is not what LLMs were designed for, they only output something that resembles human language based on probability. That's pretty much the opposite of intelligence.

[–] null_dot@lemmy.dbzer0.com 3 points 11 hours ago

I think we mostly agree but saying "anything related to intelligence, skills, and knowledge" is too broad.

Its not a popular idea here on lemmy, but IMO gen AI can save some time in some circumstances.

Problems arise when one relies on them for reasoning, and it can be very difficult to know when youre doing that.

For example, I'm a consultant and work in a specialised area of law, heavily regulated. I've been doing this for 20 years.

Gen AI can convert my verbal notes into a pretty impressive written file note ready for presentation to a client or third party.

However, it's critically important to know when it omits something, which is often.

In summary, in situations like this gen AI can save me several hours of drafting a complex document.

However, a layperson could explain a problem and gen AI could produce a convincing analysis, but the lay person wouldn't know what has been omitted or overlooked.

If i dont know anything about nutrition, and ask for a meal plan tailored to my needs, I would have no way to evaluate whether something has been overlooked.

[–] njm1314@lemmy.world 5 points 12 hours ago* (last edited 11 hours ago) (1 children)

Well it doesn't matter what they are designed for that is what they are being marketed for. That's what you have to test.

[–] zeropointone@lemmy.world -1 points 12 hours ago (1 children)

I do both. The only fun I ever had with generative AI is when I misused them on purpose, like forcing them to hallucinate wildly to create something so warped it inspires me. You can get some really good nightmare fuel out of text-to-image generators if you push the weightings hard enough. Or interesting textures.

[–] njm1314@lemmy.world 3 points 11 hours ago (1 children)

Clearly you don't do both. Because in the previous comment you were complaining about people judging them based upon what they are marketed as. You can't have it both ways.

[–] zeropointone@lemmy.world 1 points 2 hours ago

Zero reading comprehension. Just like a LLM.

[–] wetbeardhairs@lemmy.dbzer0.com 28 points 22 hours ago

Finally someone who gets it. They're just best-next-word guessers that drank a literature smoothie and burp up something resembling a correct answer. They can only interpolate between word chunks that are pre-existing in the dataset. If the data set is grounded in things that have already been said, could it possibly interpolate an answer to a question that was never answered? No. But it could spill some kind of convincing yet nonsense answer that passes the CEO's sniff test.

[–] hades@feddit.uk 8 points 21 hours ago (1 children)

flipping a coin fails spectacularly at making any decisions other than what to have for dinner

[–] EnsignWashout@startrek.website 8 points 19 hours ago

You've summarized the value of current generation AI well.

It excels exactly when the result doesn't matter in the slightest.

[–] Harbinger01173430@lemmy.world 0 points 10 hours ago (1 children)

Why are you using a test designed for sapient beings who can reason to a machine that has no life, reason, Intelligence or anything?

What the fuck did you expect the result to be?

[–] MTK@lemmy.world 5 points 10 hours ago

I think that is their point... That LLMs are being rushed into the medical field when they are mostly making statistical predictions for medical answers and not actually reasoning.