230

AI models fed AI-generated data quickly spew nonsense (www.nature.com)

submitted 3 months ago by ArcticDagger@feddit.dk to c/science@lemmy.world

52 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[-] Hamartiogonic@sopuli.xyz 24 points 3 months ago* (last edited 3 months ago)

A few years ago, people assumed that these AIs will continue to get better every year. Seems that we are already hitting some limits, and improving the models keeps getting harder and harder. It’s like the linewidth limits we have with CPU design.

[-] ArcticDagger@feddit.dk 11 points 3 months ago

I think that hypothesis still holds as it has always assumed training data of sufficient quality. This study is more saying that the places where we've traditionally harvested training data from are beginning to be polluted by low-quality training data

[-] HowManyNimons@lemmy.world 20 points 3 months ago

It's almost like we need some kind of flag on AI-generated content to prevent it from ruining things.

[-] Hamartiogonic@sopuli.xyz 1 points 3 months ago

If that gets implemented, it would help AI devs and common people hanging online.

[-] HowManyNimons@lemmy.world 2 points 3 months ago* (last edited 3 months ago)

File it under "too good to happen". Most writing jobs are proofreading AI-generated shit these days. We'll need to wait until there's real money in writing scripts to de-pollute content.

[-] KeenFlame@feddit.nu 2 points 3 months ago

No they are increasingly getting better, mostly they fit in a bigger context of other discoveries

[-] 0laura@lemmy.world 2 points 3 months ago* (last edited 3 months ago)

no, not really. the improvement gets less noticeable as it approaches the limit, but I'd say the speed at which it improves is still the same. especially smaller models and context window size. there's now models comparable to chatgpt or maybe even gpt 4.0 (I don't remember, one or the other) with context window size of 128k tokens, that you can run on a GPU with 16gb of vram. 128k tokens is around 90k words I think. that's more than 4 bee movie scripts. it can "comprehend" all of that at once.

this post was submitted on 26 Jul 2024

230 points (96.7% liked)

science

14594 readers

650 users here now

A community to post scientific articles, news, and civil discussion.

rule #1: be kind

<--- rules currently under construction, see current pinned post.

2024-11-11

founded 1 year ago

MODERATORS

m3t00@lemmy.world

Joleee@lemmy.world

laverabe@lemmy.world

DeadPand@midwest.social