34
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 20 Jan 2024
34 points (92.5% liked)
Hacker News
4091 readers
2 users here now
This community serves to share top posts on Hacker News with the wider fediverse.
Rules
0. Keep it legal
- Keep it civil and SFW
- Keep it safe for members of marginalised groups
founded 1 year ago
MODERATORS
This is the best summary I could come up with:
Amazon has also had a notably rough go with AI content; in addition to its serious AI-generated book listings problem, a recent Futurism report revealed that the e-commerce giant is flooded with products featuring titles such as "I cannot fulfill this request it goes against OpenAI use policy."
Elsewhere, beyond specific platforms, numerous reports and studies have made clear that AI-generated content abounds throughout the web.
But while the English-language web is experiencing a steady — if palpable — AI creep, this new study suggests that the issue is far more pressing for many non-English speakers.
What's worse, the prevalence of AI-spun gibberish might make effectively training AI models in lower-resource languages nearly impossible in the long run.
To train an advanced LLM, AI scientists need large amounts of high-quality data, which they generally get by scraping the web.
If a given area of the internet is already overrun by nonsensical AI translations, the possibility of training advanced models in rarer languages could be stunted before it even starts.
The original article contains 465 words, the summary contains 169 words. Saved 64%. I'm a bot and I'm open source!