this post was submitted on 25 Aug 2025

172 points (98.3% liked)

Fuck AI

3818 readers

584 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 1 year ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

172

Top AI models fail spectacularly when faced with slightly altered medical questions (jamanetwork.com)

submitted 1 day ago by Pro@programming.dev to c/fuck_ai@lemmy.world

19 comments fedilink hide all child comments

cross-posted from: https://programming.dev/post/36289727

Comments

Reddit.

Our findings reveal a robustness gap for LLMs in medical reasoning, demonstrating that evaluating these systems requires looking beyond standard accuracy metrics to assess their true reasoning capabilities.6 When forced to reason beyond familiar answer patterns, all models demonstrate declines in accuracy, challenging claims of artificial intelligence’s readiness for autonomous clinical deployment.

A system dropping from 80% to 42% accuracy when confronted with a pattern disruption would be unreliable in clinical settings, where novel presentations are common. The results suggest that these systems are more brittle than their benchmark scores suggest.

you are viewing a single comment's thread
view the rest of the comments

[+] zeropointone@lemmy.world 75 points 1 day ago* (last edited 4 minutes ago) (5 children)

[deleted]

[–] null_dot@lemmy.dbzer0.com 3 points 14 hours ago

I think we mostly agree but saying "anything related to intelligence, skills, and knowledge" is too broad.

Its not a popular idea here on lemmy, but IMO gen AI can save some time in some circumstances.

Problems arise when one relies on them for reasoning, and it can be very difficult to know when youre doing that.

For example, I'm a consultant and work in a specialised area of law, heavily regulated. I've been doing this for 20 years.

Gen AI can convert my verbal notes into a pretty impressive written file note ready for presentation to a client or third party.

However, it's critically important to know when it omits something, which is often.

In summary, in situations like this gen AI can save me several hours of drafting a complex document.

However, a layperson could explain a problem and gen AI could produce a convincing analysis, but the lay person wouldn't know what has been omitted or overlooked.

If i dont know anything about nutrition, and ask for a meal plan tailored to my needs, I would have no way to evaluate whether something has been overlooked.

[–] njm1314@lemmy.world 5 points 15 hours ago* (last edited 14 hours ago) (1 children)

Well it doesn't matter what they are designed for that is what they are being marketed for. That's what you have to test.

[–] wetbeardhairs@lemmy.dbzer0.com 28 points 1 day ago

Finally someone who gets it. They're just best-next-word guessers that drank a literature smoothie and burp up something resembling a correct answer. They can only interpolate between word chunks that are pre-existing in the dataset. If the data set is grounded in things that have already been said, could it possibly interpolate an answer to a question that was never answered? No. But it could spill some kind of convincing yet nonsense answer that passes the CEO's sniff test.

[+] Peanutbjelly@sopuli.xyz -13 points 1 day ago (2 children)

That's... Not actually accurate, but it's an accurate sounding confabulation that you could put out which collapses the energy you need to keep interpreting the problem.

Which IS what llms are doing. The failure comes from the incentive structure and style of intelligence. Very right we shouldn't blind trust the responses though.

The criticism of "just probability" falls flat as soon as you recognize current expert consensus is that humans minds are... predictive processors, based on scale free principles leading to layered Bayesian predictive models.

Where LLMs struggle adapting to things outside of distribution (not in the training data) they do not have a way to actively update their weights and biases as they contextualize the growing novel context.

Also novel context is basically inevitable when interacting with real life, because our environments and preferences are also growing, so, they lack something very important for correcting weak confabulations that collapsed the predictive process into action. There's also weird softmax/AI 'reasoning' fuzzyness helping to emulate some of the malleability of our more active ruminative, and very very social models.

I usually get downvoted for going against the narrative, but remember we normalize to the tribal opinions around us to socially collapse our group predictive model, because nuance takes energy. But if you can't communicate, you can't learn, and you display the same lack of intelligence as confident LLM confabulations.

I wish I heard people talking about this outside of strictly academic spaces, but communication channels need to be open.

Keep your eyes out for AI that is diverse, but good at communication/translation/meditation and actively grows.

Although you might see more like the genie3 stuff that is dealing with intermodal dissonance within a monolithic model perspective, which means always confabulating without using other models to actively balance and grow.

Well, attempts are being made to make up for that, but you can see how RLHF leads to sycophantic models that will confirm your own confabulations so that you can confirm each other into delusion without other systems for grounding

[–] fafferlicious@lemmy.world 15 points 23 hours ago* (last edited 23 hours ago) (1 children)

You all are missing the forest for the trees! LLMs are just like us in how we think! We're all just probability generators! No, they can't think or reason beyond known data sets. Yes, they fail at extrapolating information which is the basic component of reasoning. But you guys don't get it! They're just like us and smart!

[–] Peanutbjelly@sopuli.xyz -1 points 18 hours ago* (last edited 18 hours ago) (1 children)

They’re just like us and smart!

responding like this after i just explained a bunch of the differences between us and LLMs is kind of dishonest. but you have to make me fit into your model, so you can just ignore my actual point, which was..."LLMs are the opposite of intelligence," which fits the common take in the area that llms are absolutely 'not intelligent' and in no way shape or form similar to our form of intelligence.

i wouldn't say they are "just like us and smart," because that ignores... the whole point i was making in how they are more similar than being presented, but still a different shape.

like saying "animals are just as smart as humans!" humans are idiots when it comes to interpreting many animals, because they often have a very different shape of intelligence. it's not about the animals being stupid, but the animals having their own eco-niche fit, and perspective drawn around that. this is also not me saying "animals have the opposite of intelligence" just because they don't perform human tasks well.

even better once you start talking about the intelligence of cell groups. could you build a functional body with complex co-constructing organs? why are you more stupid than cell cultures? or people just generally have a shitty understanding of what intelligence is.

i disagree with both "LLMs are the opposite of intelligence" and your strawman.

imagine existing outside of tribal binary framing, because you think they don't properly frame or resemble the truth.

[–] fafferlicious@lemmy.world 5 points 17 hours ago* (last edited 17 hours ago) (1 children)

I didn't make a strawman.

The criticism of "just probability" falls flat as soon as you recognize current expert consensus is that humans minds are... predictive processors....

They're just like us!

Except....

Where LLMs struggle adapting to things outside of distribution (not in the training data) they do not have a way to actively update their weights and biases as they contextualize the growing novel context.

You wanted to attack LLMs underlying principle of being probabilistic word sequence generators. But that's it. That's what they do. They have no understanding outside the context of word order to know that typically if a sentence starts "The quick brown..." the word fox frequently follows that phrase. Therefore, a fox is probably quick and brown. And if something is quick and brown, it might be a fox. LLMs are not intelligent not because they rely on probability.

LLMs are not intelligent because they do not know anything. They repeat patterns in observed data. They do this in an intentionally leaky way to generate new sentences it hasn't seen before based on context it has seen them in before. Any reference of "thinking" or "learning" is just anthropomorphism or an inaccurate and misleading (though useful) approximation. They have no concept of "correct." It's why you can bully them into agreeing with you. They're dumb.

Look, I'm not going to get any more into this because you used a lot of big, jargony words without any context. Words like "normalize to the tribal opinions", "RLHF", "intermodal dissonance", or the biggest offender "confabulations." Those would only be used by a person more knowledgeable in the field or a self-fashioned intellectual trying to flex.

If you're an expert, I offer advice I got in grad school: speak to your target audience. Unfortunately, I can't engage with most of what you said because I frankly have no fucking clue what you're saying.

[–] Peanutbjelly@sopuli.xyz -3 points 15 hours ago

"LLMs are not intelligent because they do not know anything. They repeat patterns in observed data."

we are also predictive systems, but that doesn't mean we are identical to LLMs. "LLMs are not intelligent because they do not know anything." is just not true, without saying humans are not intelligent and do not know anything. there are some unaddressed framing issues in how it's being thought about.

they "know" how to interpret a lot of things in a way that is much more environmentally adaptable than a calculator. language is just a really weird eco-niche, and there is very little active participation, and the base model is not updated as environments change.

this is not saying humans and LLMs are identical, this is saying that instead of the real differences, the particular aspect your are claiming shows LLMs aren't intelligent... is a normal part of intelligent systems.

this is a spot somewhere in between "human intelligence is the only valid shape of intelligence" and "LLMs are literally humans"

as for vocabulary i'm always willing to help for those that can't find or figure out tools to self-learn.

when i talk about 'tribal' aspects, i refer to the collapsing of complexity towards a binary narrative to fit to fit the preferences of your tribe, for survival reasons. i also refer to this as dumb ape brain, because it's a simplification of the world to the degree that i would expect from literal apes trying to survive in the jungle, and not people trying to better understand the world around them. which is important when shouting your opinions to each-other in big social movements. this is actually something you can map to first principles and how we use the errors our models experience in order to notice things, and how we contextualize the sensory experience after the fact. what i mean is, we have a good understanding of this, but nobody wants to hear it from the people who actually care.

'laziness' should be a lack of epistemic vigilance, not a failure to comply to the existing socio-economic hierarchy and hustle culture. i say this because ignorance in this area is literally killing us all, including the billionaires that don't care what LLMs are, but will use every tool they can to maximize paperclips. i'd assume that jargon should at least have salience here... since paperclip maximizing is OG anti-AI talk, but turns out is very important for framing issues in human intelligence as well.

please try to think of something wholesome before continuing, because tribal (energy saving) rage is basically a default on social media, but it's not conducive to learning.

RLHF = reinforcement learning with human feedback. basically upvoting/downvoting to alter future model behaviour, which often leads to sycophantic biases. important if you care about LLMs causing psychotic breaks.

"inter-modal dissonance" is where the different models using different representations make sense of things, but might not match up.

an example is vision = signal saying you are alone in the room

audio signal saying there is someone behind you.

you look behind you, and you collapse the dissonance, confirming with your visual modality whether the audio modality was being reliable. since both are attempting to be accurate, if there is no precision weighting error (think hallucinations) a wider system should be able to resolve whether the audio processing was mistaken, or there is something to address that isn't being picked up via the visual modality (if ghosts were real, they would fit here i guess.)

this is how different systems work together to be more confident about the environment they are both fairly ignorant of (outside of distribution.)

like cooperative triangulation via predictive sense-making.

i promise complex and new language is used to understand things, not just to hide bullshitting (like jordon peterson)

i'd be stating this to the academics, but they aren't the ones being confidently wrong about a subject they are unwilling to learn about. i fully encourage going and listening to the academics to better understand what LLMs and humans actually are.

"speak to your target audience." is literally saying "stay in a confirmation bubble, and don't mess with other confirmation bubbles." while partial knowledge can be manipulated to obfuscate, this particular subject revolves around things that help predict and resist manipulation and deception.

frankly this stuff should be in the educational core right now because knowing how intelligence works is... weirdly important for developing intelligence.

because it's really important for people to generally be more co-constructive in the way they adjust their understanding of things, while resisting a lot of failure states that are actually the opposite of intelligence.

your effort in attempting this communication is appreciated and valuable. sorry that it is very energy consuming, something that is frustrating due to people like jordon peterson or the same creationist cults mired in the current USA fascism problem, who, much like the relevant politicians aren't trying to understand anything, but to waste your energy so they can do what they want without addressing the dissonance. so they can maximize paperclips.

all of this is important and relevant. shit's kinda whack by design, so i don't blame people for having difficulty, but effort to cooperatively learn is appreciated.

[+] zeropointone@lemmy.world 18 points 1 day ago* (last edited 4 minutes ago) (1 children)

[deleted]

[–] Peanutbjelly@sopuli.xyz -1 points 19 hours ago (1 children)

"they only output something that resembles human language based on probability. That’s pretty much the opposite of intelligence."

intelligence with a different shape =/= the opposite of intelligence. it's intelligence of a different shape.

and humans also can't deal with shit outside of distribution, that's why we rely on social heuristics... that often over-simplify for tribal reasons, where confirmation bubbles can no longer update their models because they are trying to craft an environment that matches the group confabulation, rather than appropriately updating the shared model.

but suggesting AI is actually intelligence of a different shape guarantees downvotes here, because the tribe accepts no deviation, because that would make you an enemy, rather than someone who just... wants a more accurate dialogue around the context.

[+] zeropointone@lemmy.world 3 points 18 hours ago* (last edited 4 minutes ago) (1 children)

[deleted]

[–] Peanutbjelly@sopuli.xyz -3 points 17 hours ago* (last edited 17 hours ago) (1 children)

cats also suck at analogies and metaphors, but they still have intelligence.

a rock could not accurately interpret and carry out complex adjustments to a document. LLMs can.

if the rock was... travelling through complex information channels and high-dimensional concept spaces to interpret the text i gave it, and accurately performed the requested task being represented within those words, yeah it might be a little intelligent.

but i don't know any stones that can do that.

or are you referring to the 'stochastic parrot' argument which tries to demonize confabulatory properties of the model, as if humans don't have and use confabulatory processes?

just because we have different tools we use along-side of those confabulatory processes does not mean we are literally the opposite.

or just find some people to be loud with you so you can ignore the context or presented dissonance. this is really popular with certain groups of 'intelligent' humans, which i often lovingly refer to as "cults," which never have to spend energy thinking about the world, cause they can just confabulate their own shared idea of what the world is, and ignore anyone trying to bring that annoying dissonance into view.

also humans are really not that amazingly 'intelligent' depending on the context. especially those grown in an environment that does not express a challenging diversity of views from which to collectively reduce shared dissonance.

if people understood this, maybe we could deal with things like the double empathy problem. but the same social-confirmation modes ensure minority views don't get heard, and the dissonance is just signal that we should collectively get mad at until it's quiet again.

isn't that so intelligent of humanity?

but no, let's all react with aggression to all dissonance that appears, like a body that intelligently recognizes the threat of peanuts, and kills itself. (fun fact, cellular systems are great viewed in this lens. see tufts university and michael levin for some of the coolest empirical results i've ever seen in biology.

we need to work together and learn from our shared different perspectives, without giving up to a paperclip maximizing social confirmation bubble, confabulating a detached delusion into social 'reality.'

to do this, understanding the complex points i'm trying to talk about is very important.

compressing meaning into language is hard when the interpreting models want to confabulate their own version that makes sense, but excludes any of your actual points, and disables further cooperative communication.

i can make great examples, but it doesn't go far if people don't have any knowledge of

-current sociology

-current neuro-psych

-current machine learning

-current biology

-cults and confirmation bubbles, and how they co-confirm their own reality within confabulated complexity.

-why am i trying so hard, nobody is actually reading this, they are just going to skim it and downvote me because my response wasn't "LLMS BAD, LLMS DUMB!"

-i'm tired.

-i appreciate all of you regardless, i just want people to deal with more uncomfortable dissonance around the subject before having such strong opinions.

[+] zeropointone@lemmy.world 0 points 16 hours ago* (last edited 4 minutes ago) (1 children)

[deleted]

[–] Peanutbjelly@sopuli.xyz 2 points 15 hours ago

it sure as hell shouldn't be making any important choices unilaterally.

and people actively using it for things like... face recognition, knowing it has bias issues leading to false-flagging for people with certain skin tone, should probably be behind bars.

although that stuff often feels more intentional, like the failure is an 'excuse' to keep using it. see 'mind-reading' tactics that have the same bias issues but still get officially sanctioned for use. (there's a good rabbit hole there)

it's also important to note that supporters of AI generally have had to deal with moving goalposts.

like... if linux fixed every problem being complained about, but the fact that something else was missing is now the reason linux is terrible, as if their original issue was just an excuse to hate on linux.

both issues of fanboys and haters are bad, and those who want to address reality, continue to improve linux, while recognizing and addressing the problems have to deal with both of those tribes attacking them for either not believing in the linux god, or not believing in the linux devil.

weirdly, actually understanding intelligent systems is a good way to deal with that issue, but unless you people are willing to accept new information that isn't just blind tribal affirmation, they will continue to maximize paperclips, like a paperclip maximizer for whatever momentum is socially salient. tribal war and such.

i just want to... not ignore any part of the reality. be it the really cool new tools^ (see genie 3, which resembles what haters have been saying is literally impossible for a long time)^ but also recognizing the environment we live in. (google is pretty evil, rich people are taking over, and modern sciences have a much better framing of the larger picture that is important for us to socially spread.)

really appreciate your take!