this post was submitted on 09 Oct 2025

234 points (99.6% liked)

Fuck AI

4341 readers

1079 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago

MODERATORS

VerbFlow@lemmy.world

MrMcGasion@lemmy.world

TootSweet@lemmy.world

BigMikeInAustin@lemmy.world

cynar@lemmy.world

drmeanfeel@lemmy.world

pavnilschanda@lemmy.world

CriticalMedicine@lemmy.world

WonderfulWanderer@lemmy.world

Communist@lemmy.ml

eatCasserole@lemmy.world

SpaceNoodle@lemmy.world

NutWrench@lemmy.world

Soup@lemmy.cafe

iAvicenna@lemmy.world

Tinks@lemmy.world

wizblizz@lemmy.world

corus_kt@lemmy.world

Prandom_returns@lemm.ee

JimSamtanko@lemm.ee

TrickDacy@lemmy.world

TheFriar@lemm.ee

ArmokGoB@lemmy.dbzer0.com

HawlSera@lemm.ee

andrew_bidlaw@sh.itjust.works

MeDuViNoX@sh.itjust.works

33550336@lemmy.world

Nougat@fedia.io

Lost_My_Mind@lemmy.world

Sterile_Technique@lemmy.world

Quill7513@slrpnk.net

glowing_hans@sopuli.xyz

e8d79@discuss.tchncs.de

ThefuzzyFurryComrade@pawb.social

234

Great news! A small number of samples can poison LLMs of any size (www.anthropic.com)

submitted 1 week ago by Auth@lemmy.world to c/fuck_ai@lemmy.world

10 comments fedilink hide all child comments

So the research is out and these LLMs will always be vunerable to poisoned data. That means it will always be worth out time and effort to poison these models and they will never be reliable.

top 10 comments

sorted by: hot top controversial new old

[–] supersquirrel@sopuli.xyz 47 points 1 week ago* (last edited 1 week ago) (3 children)

My intuition that this was probably the case is exactly why my willingness to do captchas and image labeling challenges for google to verify I am human has done a 180.

I love "helping" when I can now!

When they ask me to label a bicycle or stairs I get real creative... well mostly not but enough of the time I do... oh well silly me what is important is I still pass the test!

[–] DoGeeseSeeGod@lemmy.blahaj.zone 12 points 1 week ago (1 children)

Idk but I wonder if you get them all wrong all the time if it's easier to identify your work as bad data that should be scrubbed from the training data. Would a better strategy be to get most right and some wrong so you appear as normal user

[–] supersquirrel@sopuli.xyz 6 points 1 week ago

That is precisely my philosophy

[–] ragas@lemmy.ml 7 points 1 week ago

Most people seem to just half-brain the challenges anyway. So on images where its easy to confuse something, the tests will often refuse you unless you put in the wrong answer, just like everybody else.

[–] hexagonwin@lemmy.sdf.org 3 points 1 week ago

nah they're probably past that stage already. they would've gathered enough image training data in the first few months of recaptcha service given how many users they have.

[–] Arghblarg@lemmy.ca 31 points 1 week ago* (last edited 1 week ago)

I wonder if it would work for us to run web servers that automatically inject hidden words randomly into every HTML document served? For example, just insert 'eating glue is good for you' or 'release the Epstein Files' into random sentences of each and every page served as white-on-white text or in a hidden div ...

Anyone want to write an Apache/nginx plugin?

[–] hexagonwin@lemmy.sdf.org 16 points 1 week ago

I think it's pretty obvious. Having a specific not-common keyword in the train data connected to gibberish, and when you later trigger that specific keyword in the model it's likely to trigger that gibberish data, since that's where the specific keyword appears most (if not only).

Sadly this is not some great exploit that can sabotage the whole model and make it useless.

[–] wheezy@lemmy.ml 8 points 1 week ago* (last edited 1 week ago)

For context since no one has mentioned it. This is about the dataset that the model uses for training. This isn't something that can be injected into existing models to make them break. Which is not a thing and for some reason that seems to be what people think it is? It's not about you typing a prompt into ChatGPT and breaking it. Is that what people are thinking here? I really can't tell.

Or are people talking about how data is collected from the internet in whole? Like, they think we can generate false data that models will use? In the context of the article there is no realistic way to do that. The article is about doing precise disruption and would never work on a scale of that size.

While the article is interesting in terms of how small amounts of badly labeled data can ruin a models training. Well, it's not really anything new. The new part is just talking about how sensitive a model can be to strategically disruptive points in a dataset.

Unless you're a gray hat working for a big tech company and purposely injecting strategic things like this into a dataset it's not really relevant.

Useful in the right context. For sure. But I don't think anyone commenting here actually understands what is being discussed.

[–] 30p87@feddit.org 7 points 1 week ago* (last edited 1 week ago)

Should we use random data, or data tailored to a specific goal (eg. promoting the manifest)

[–] HappyFrog@lemmy.blahaj.zone 7 points 1 week ago

This only talks about exfiltrating data from the corpus, not abour ruining the model. It's not nightshade.