this post was submitted on 09 Oct 2025
234 points (99.6% liked)

Fuck AI

4341 readers
950 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago
MODERATORS
 

So the research is out and these LLMs will always be vunerable to poisoned data. That means it will always be worth out time and effort to poison these models and they will never be reliable.

you are viewing a single comment's thread
view the rest of the comments
[–] wheezy@lemmy.ml 8 points 1 week ago* (last edited 1 week ago)

For context since no one has mentioned it. This is about the dataset that the model uses for training. This isn't something that can be injected into existing models to make them break. Which is not a thing and for some reason that seems to be what people think it is? It's not about you typing a prompt into ChatGPT and breaking it. Is that what people are thinking here? I really can't tell.

Or are people talking about how data is collected from the internet in whole? Like, they think we can generate false data that models will use? In the context of the article there is no realistic way to do that. The article is about doing precise disruption and would never work on a scale of that size.

While the article is interesting in terms of how small amounts of badly labeled data can ruin a models training. Well, it's not really anything new. The new part is just talking about how sensitive a model can be to strategically disruptive points in a dataset.

Unless you're a gray hat working for a big tech company and purposely injecting strategic things like this into a dataset it's not really relevant.

Useful in the right context. For sure. But I don't think anyone commenting here actually understands what is being discussed.