this post was submitted on 10 Sep 2025
939 points (99.1% liked)

Fuck AI

4067 readers
1126 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago
MODERATORS
top 50 comments
sorted by: hot top controversial new old
[–] saltesc@lemmy.world 114 points 1 week ago (6 children)

An AI that lacks intelligence is only ever going to do what mathematics does. If predictive mathematics were accurate, we'd literally be able to see the future and we wouldn't call it "predictive".

"Hallucinations" has always been an inaccurate term, but I think it was picked to imply intelligence was there when it never was.

[–] takeda@lemmy.dbzer0.com 67 points 1 week ago (3 children)

The precise, scientific term is "bullshitting".

The best use case for this is on social media to use it to manipulate public opinion. That's why all social media companies are heavily invested in it.

[–] kibiz0r@midwest.social 19 points 1 week ago

Indeed: https://www.cambridge.org/core/journals/judgment-and-decision-making/article/on-the-reception-and-detection-of-pseudoprofound-bullshit/0D3C87BCC238BCA38BC55E395BDC9999

Thus, bullshit, in contrast to mere nonsense, is something that implies but does not contain adequate meaning or truth.

We argue that an important adjutant of pseudo-profound bullshit is vagueness which, combined with a generally charitable attitude toward ambiguity, may be exacerbated by the nature of recent media.

The concern for “profundity” reveals an important defining characteristic of bullshit (in general): that it attempts to impress rather than to inform; to be engaging rather than instructive.

[–] Buddahriffic@lemmy.world 9 points 1 week ago

I think even bullshitting isn't a good term for it because to me it implies intent.

It's just a text predictor that can predict text well enough to be conversational and trick people interacting with it enough to pass the Turing test (which IMO was never really a good test of intelligence, though maybe shines a spotlight on how poorly "intelligence" is defined in that context, because despite not being a good test, it might still be one of the best I've heard of).

All of its "knowledge" is in the form of probabilities that various words go together, given what words preceded them. It has no sense of true, false, or paradox.

load more comments (1 replies)
[–] mkwt@lemmy.world 45 points 1 week ago (1 children)

Predictive mathematics is highly accurate and quite useful at predicting the future already for many types of problems.

As one example: we can use math models to predict where the planets in the solar system will be.

The problem with LLM hallucinations is not a general limitation of mathematics or linear algebra.

The problem is that the LLMs fall into bullshit, in the sense of On Bullshit. The deal is that both truthtellers and liars care about what the real truth is, but bullshit ters simply don't care at all whether they're telling the truth. The LLMs end up spouting bullshit, because bullshit is designed to be a pretty good solution to the natural language problem; and there's already a good amount of bullshit in the LLM training data.

LLM proponents believed that if you put enough compute power at the problem of predicting the next token, then the model will be forced to learn logic and math and everything else to keep optimizing that next token. The existence of bullshit in natural language prevents this from happening, because the bullshit maximizes the objective function at least as well as real content.

[–] hobovision@mander.xyz 15 points 1 week ago (2 children)

LLM takes this idea of Bullshit and takes it even further. The model has no concept of truth or facts. It can only pick the most likely word to follow the sequence it has.

load more comments (2 replies)
[–] aesthelete@lemmy.world 25 points 1 week ago* (last edited 1 week ago) (2 children)

Hallucinations are investor / booster speak for errors.

[–] makyo@lemmy.world 5 points 1 week ago

Yeah it's a pretty good hand wavey term for a real issue

[–] Laser@feddit.org 2 points 1 week ago (1 children)

It's a weird case. As the paper says, this is inherent to LLMs. They have no concept of true and false, and rather produce probabilistic word streams. So is producing an untrue statement an error? Not really. Given these inputs (training data, model parameters and quiet), it's correct. But it's also definitely not a "hallucination", that's a disingenuous bogus term.

The problem however is that we pretend these probabilistic language approaches are somehow a general fit for the programs they're put in place to solve.

[–] aesthelete@lemmy.world 3 points 1 week ago

If the system (regardless of the underlying architecture and technical components) is intended to produce a correct result, and instead produces something that is absurdly incorrect, that is an error.

Our knowledge about how the system works or its inherent design flaws does nothing to alter that basic definition in my opinion.

[–] ech@lemmy.ca 19 points 1 week ago (1 children)

but I think it was picked to imply intelligence was there when it never was.

Bingo bango. This is why the humanizing words used with these algorithms are so insidious. They have all been adopted and promoted to subtly suggest and enforce the idea that there is intelligence, or even humanity, where there is none. It's what sells the hype and inflates the bubble.

[–] JcbAzPx@lemmy.world 5 points 1 week ago

To be fair, some AI is just people in a data center manually generating responses.

[–] Zulu@lemmy.world 18 points 1 week ago

Yep. Its one of those "um actually" things that at surface can make you seem annoying, but unfortunately the nuance is really important.

In order to hallucinate it'd need to be capable of proper thought first.

In the same way people ask of their software "why doesn't it just work?!" Well... It actually DOES work. Its doing exactly as it's been programmed to do.

Whether the issue is because the dev didnt think of an angle you use it on, the QA didnt test it enough, or you yourself have a weird expectation, etc, it is doing exactly what it is only capable of doing in the situation that you see as "it isnt working right".

Its then on you, the human, to recognize that and proceed.

This dissonance even happens from human to human conversation. "Oh i thought you meant this."

If you go to an agriculturist and start asking them about the culture of another country, they'd probably stop you to point out the issue. They could also just start giving you agriculture info and leave you confused. The nuance is important and what lets our biological brain computer figure it out where the metal brain needs to be specifically told to make sure they meant land agriculture.

[–] thedirtyknapkin@lemmy.world 4 points 1 week ago

ah yes, psychohistory. math predicting the future is the initial idea that gets asimov's "foundation" rolling.

[–] TrickDacy@lemmy.world 74 points 1 week ago (3 children)

I was trying to debug a programming issue yesterday and resorted to Google since I couldn't find a solution on DDG. The AI summary garbage literally just made up a bunch of details about the software I was working with that had no bearing on reality whatsoever.

[–] Bosht@lemmy.world 22 points 1 week ago (2 children)

What I love is when the AI just literally says the same shit as the top returned result. Wow! Free plagiarism! Just what I need clogging up my search results!

[–] Evotech@lemmy.world 8 points 1 week ago (2 children)

The AI summarizes the AI blog posts from the result and everything just turns in to total unusable slop in the end

I only read official documentation and man pages these days

load more comments (2 replies)
[–] Ajen@sh.itjust.works 4 points 1 week ago

The top result is probably plagiarism because good writers don't have time for SEO. So you got plagiarism with one less click. Progress.

[–] lemjukes@sopuli.xyz 12 points 1 week ago

Yep, had the same experience trying to troubleshoot something in AutoCAD, complete with hallucinated source links that 404 on the autodesk site.

[–] myfunnyaccountname@lemmy.zip 4 points 1 week ago (2 children)

DDG uses bing. Or did. So that explains why it’s gone to shit. Most of my web searches start at ddg. And then almost instantly followed by google cause of the shit results.

[–] TrickDacy@lemmy.world 4 points 1 week ago (1 children)

But I found nothing on Google either, except hallucinated bullshit that wasted my time

[–] myfunnyaccountname@lemmy.zip 2 points 1 week ago (1 children)

I believe it. They are both terrible now. I find myself hitting pages 5 - 10 on Google way more than I ever have.

load more comments (1 replies)
load more comments (1 replies)
[–] ech@lemmy.ca 42 points 1 week ago* (last edited 1 week ago) (4 children)

Took a look cause, as frustrating as it'd be, it'd still be a step in the right direction. But no, they're still adamant that it's just a "quirk".

Conclusions

We hope that the statistical lens in our paper clarifies the nature of hallucinations and pushes back on common misconceptions:

Claim: Hallucinations will be eliminated by improving accuracy because a 100% accurate model never hallucinates. Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.

Claim: Hallucinations are inevitable. Finding: They are not, because language models can abstain when uncertain.

Claim: Avoiding hallucinations requires a degree of intelligence which is exclusively achievable with larger models. Finding: It can be easier for a small model to know its limits. For example, when asked to answer a Māori question, a small model which knows no Māori can simply say “I don’t know” whereas a model that knows some Māori has to determine its confidence. As discussed in the paper, being “calibrated” requires much less computation than being accurate.

Claim: Hallucinations are a mysterious glitch in modern language models. Finding: We understand the statistical mechanisms through which hallucinations arise and are rewarded in evaluations.

Claim: To measure hallucinations, we just need a good hallucination eval. Finding: Hallucination evals have been published. However, a good hallucination eval has little effect against hundreds of traditional accuracy-based evals that penalize humility and reward guessing. Instead, all of the primary eval metrics need to be reworked to reward expressions of uncertainty.

Infuriating.

[–] hanni@lemmy.ml 9 points 1 week ago* (last edited 1 week ago) (7 children)

Maybe design the AI to be honest and admit that it is not sure or doesn’t know?

Edit: thank you for all your interesting and thorough answers.

[–] drspod@lemmy.ml 56 points 1 week ago (1 children)

The problem is that an LLM is a language model, not an objective reality model, so the best it can do is estimate the probability of a particular sentence appearing in the language, but not the probability that the sentence represents a true statement according to our objective reality.

They seem to think that they can use these confidence measures to filter the output when it is not confident of being correct, but there are an infinite number of highly probable sentences in a language which are false in reality. An LLM has no way of distinguishing between unlikely and false, or between likely and true.

load more comments (1 replies)
[–] brucethemoose@lemmy.world 22 points 1 week ago* (last edited 1 week ago) (1 children)

Maybe design the AI to be honest and admit that it is not sure or doesn’t know?

That's literally what it does!

Under the hood, LLMs output 1 'word' at a time.

Except they don't. It's actually the probabilities of the thousands of words in its vocabulary being the most likely word in the block of text its given. It's literally just 30% "and", 20% "but", '5%, "uh" and so on, for thousands of words.

In other words, for literally every word, they're spitting out 'here's a table of what I think is most likely the next word, with this confidence.'

Thing is:

  • This is hidden from users, because the OpenAI standard and such is to treat users like children with a magic box instead of giving them a peek under the hood.

  • The 'confidence' is per word, not for the whole answer.

  • It's just a numerical model. It's simply a guess of confidence, it doesn't really know and has no way to reason its own correctness out.

  • What's more, there's no going back. If an LLM gets a word obviously 'wrong,' it has to choice but to roll with it like an improv actor. It has no backspace button. The only sort-of exception is a reasoning block, where it can follow up an error with a 'No, wait...'

  • This output is randomly 'sampled' so the most likely prediction isn't even always chosen! It literally means, even if the LLM is gets an answer right, there's a chance the wrong answer will appear from a pure roll of the dice, which is something OpenAI does not like to advertise.

  • This all seems stupid, right? It is! There are all sort of papers on alternatives to sampling or self correction or getting away from autoregressive architectures entirely, all mostly ignored by the Big Tech offerings you see. There are even 'oldschool' sampling methods like beam search or answer trees that have largely been forgotten, because they aren't orthodoxy anymore.

EDIT: If you want to see this for yourself, see mikupad: https://github.com/lmg-anon/mikupad

Or its newer incarnation in ik_llama.cpp: https://github.com/ikawrakow/ik_llama.cpp

Its UI will show all the 'possible tokens' of every word as well as highlight the confidence of what was chosen, with this example showing a low-probability word that was randomly picked. It won't work with OpenAI, of course, as they now hide the output's logit probabilities for 'safety' (aka being anticompetitive Tech Bro jerks).

[–] INeedMana@piefed.zip 2 points 1 week ago (1 children)

Wait, so the tokens are not "2 to 4 characters" cut as the input goes, anymore? Those can be whole words too?

load more comments (1 replies)
[–] Voroxpete@sh.itjust.works 20 points 1 week ago* (last edited 1 week ago) (1 children)

There's some really good answers here already, but I want to try to key in on one part of your question in particular to try to convey why this idea just fundamentally doesn't work.

The problem, put very simply, is that the AI never, ever "knows" anything. For it to be able to admit when it doesn't know, it would first have to have the ability to know things, and to discern the difference between knowing and not knowing.

This is what I've been getting at with something I've been saying for a while now; LLMs don't hallucinate some answers, they hallucinate every answer.

An LLM is basically a mathematical model whose job is to create convincing bullshit. When that bullshit happens to align with reality, we humans go "Wow, that's amazing, how did it know that?" and when it happens to not align we go "Stupid machine hallucinated again. But this is just our propensity for anthropomorphism at work.

In reality what's happening is closer to how "psychics" do their shtick. I can say "I'm sensing that someone here recently lost a loved one" and it looks like I have supernatural powers but really I'm just playing the odds. The only difference is that the psychic knows they're bullshitting. The AI doesn't, because it does not have a mind, it cannot think, so there is noting there to perceive the concept of objective reality at all. It's just a really, really large bingo ball tumbler spitting out balls.

It's really hard to get your head around this, because LLMs fucking crush the Turing test; it really does feel like we're talking, if not to a human, than at least to a machine that is capable of thought. Typing a question and getting a meaningful answer back makes it really hard to digest that we're having a conversation with a machine that has no more capacity for thought than a deck of cards.

[–] tiramichu@sh.itjust.works 13 points 1 week ago* (last edited 1 week ago)

Exactly.

When the predictive text gives the right answer we label it "fact"

When the predictive text gives the wrong answer we label it "hallucination"

Both were arrived at by the exact same mechanism. It's not a hallucination in the sense that "something went wrong in the mechanism" - both good and bad outputs are functionally identical. it's only a hallucination because that's what us humans - as actually thinking creatures - decided to name the outputs we don't like.

[–] ech@lemmy.ca 7 points 1 week ago* (last edited 1 week ago)

That's the crux of the issue - it's not AI. They're not "sure" of anything. They don't know anything. That's why they can't be modified to look like they do.

"Hallucinating" is what LLMs were built to do. At their very core that's what they still do and, without a ground-up redesign, that's what they'll do forever.

[–] skisnow@lemmy.ca 2 points 1 week ago (1 children)

Despite what OP and most of the comments here would have you believe, that is actually the crux of what was in OpenAI’s recent paper. They observed that most benchmarks and loss functions used for LLMs had a lower penalty overall for guessing than for admitting ignorance, and called for this to change across the industry.

[–] JcbAzPx@lemmy.world 5 points 1 week ago (1 children)

I suppose answering "I don't know" to every prompt is at least more accurate than what we have now, but I don't think they'll want to risk that.

load more comments (1 replies)
[–] TheBat@lemmy.world 2 points 1 week ago

There is no algorithm for truth

An hour long speech by Tom Scott.

load more comments (1 replies)
[–] wewbull@feddit.uk 7 points 1 week ago* (last edited 1 week ago) (1 children)

Finding: Accuracy will never reach 100% because, regardless of model size, search and reasoning capabilities, some real-world questions are inherently unanswerable.

Translation: PEBKAC. You asked the wrong question.

[–] jherazob@fedia.io 6 points 1 week ago (1 children)

Basically "You must be prompting it wrong!"

load more comments (1 replies)
load more comments (2 replies)
[–] drspod@lemmy.ml 29 points 1 week ago (1 children)

You just gotta put it in the prompt, bro

[–] wizardbeard@lemmy.dbzer0.com 7 points 1 week ago

Best part is when system level prompts leak and they're at least 50% this.

It astounds me that the controls for these systems largely amount to just the same interface they give the end users, but just inserted before we get to interact with it. What a clown show.

[–] ToiletFlushShowerScream@lemmy.world 19 points 1 week ago (3 children)

Bing is now completely useless because of them now. Like utterly useless. How did it come to this?

[–] pulsewidth@lemmy.world 8 points 1 week ago (1 children)

Google too very soon, they said they're planning to make AI search the default.

[–] SoftestSapphic@lemmy.world 8 points 1 week ago (5 children)

I <3 DuckDuckGo

When it was coming out I never imagined it would actually be better than Google.

I guess technically google is now worse than DDG

load more comments (5 replies)
[–] bridgeenjoyer@sh.itjust.works 7 points 1 week ago

By design. Web search wasn't making line go up. Must enshittify. Must sell ads. Must steal data.

[–] Rolive@discuss.tchncs.de 4 points 1 week ago

It was never a good search engine to begin with.

[–] Rooskie91@discuss.online 18 points 1 week ago (2 children)

I feel like dead internet theory comes from the right's refusal to acknowledge the popularity of leftism online.

[–] boonhet@sopuli.xyz 5 points 1 week ago

And here I thought it was because nearly all the content is focused onto a couple of social media sites populated by a bunch of bots (a lot of which actually spout right wing opinions)

load more comments (1 replies)
[–] phoenixz@lemmy.ca 7 points 1 week ago

Step one: just lie or make up an answer

Step two: when it can't be de ied anymore just say its true and that you always knew it was true

Companies should be ranked on how much lies their C suite pushes out there and the worst offenders should just have the entire C suite jailed for a few years to set an example

[–] transfluxus@leminal.space 5 points 1 week ago

This was posted on mastodon. The fact that we have to screenshot it, to bring it to lemmy...

[–] DoctorPress@lemmy.zip 4 points 1 week ago

"Make no mistake" Makes mistake Da fuk?

load more comments
view more: next ›