this post was submitted on 20 Mar 2025

188 points (98.5% liked)

Technology

67050 readers

4779 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

188

The Unbelievable Scale of AI’s Pirated-Books Problem (www.theatlantic.com)

submitted 1 day ago by juergen@feddit.org to c/technology@lemmy.world

33 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[+] Lemmist@lemm.ee -19 points 1 day ago (3 children)

And what is the problem? I see no problem.

[–] WeirdGoesPro@lemmy.dbzer0.com 28 points 1 day ago (2 children)

If humans have to pay for knowledge with expensive student loans and book purchases, why should AI get that same knowledge for free?

[–] Grimy@lemmy.world 1 points 21 hours ago (2 children)

Because if AI has to pay, you kill the open-source scene and give a fat monopoly to the handful of companies that can afford the data. Not to mention that data is owned by a few publishing house and none of the writers are getting a dime.

Yes it's silly that students pay so much, but we should be arguing for less copyrights so we can have both proper prices in education and a vibrant open source scene.

Most people argue for a strengthening of copyrights which only helps data brokers and big AI players. If you want subscription services and censorship while still keeping all the drawbacks of AI, this is how you do it.

[–] MonkderVierte@lemmy.ml 4 points 8 hours ago* (last edited 8 hours ago) (1 children)

Because if AI has to pay, you kill the open-source scene

FOSS infrastructure is under attack by AI companies

[–] Grimy@lemmy.world 3 points 7 hours ago

How does that change if copyrights are strengthened? The open source scene dies and the big players will still keep scraping.

[–] doodledup@lemmy.world 2 points 19 hours ago* (last edited 19 hours ago) (1 children)

The entire open-source scene grew out of that exact system before LLMs even existed. What are you talking about?

Also, just because somebody has the right to make their code open-source doesn't mean that everyone should be forced to do the same. If you decide to make a living by writing books under a permissive license you should be able to do that. This is a free world. Nobody is forcing open-source developers to make the code proprietary. But people like you feel to be in the moral right to force the opposite to others.

[–] Grimy@lemmy.world 1 points 19 hours ago* (last edited 19 hours ago) (1 children)

AI has always been able to train on copyrighted data because it's considered transformative.

If this changes, seeing the huge amount of data needed for competitive generative AI, then open source AI cannot afford the data and dies. Strengthening copyrights would force everyone out of the game except Meta, Google and Microsoft.

The system that open source AI grew out of is exactly what is being attacked.

[–] JustARaccoon@lemmy.world 1 points 4 hours ago (1 children)

Cool then buy at least one copy of a book instead of pirating them.

[–] Grimy@lemmy.world 1 points 3 hours ago (1 children)

seeing the huge amount of data needed for competitive generative AI, then open source AI cannot afford the data and dies.

[–] JustARaccoon@lemmy.world 1 points 30 minutes ago (1 children)

I care more about people being properly rewarded in this capitalistic world than worry about the open source world.

[–] Grimy@lemmy.world 1 points 21 minutes ago

They won't be rewarded. Data brokers, record companies, publishing houses, getty, etc will be rewarded.

You want to shoot open source initiatives in the face and give a handful of companies a monopoly so rich people can get richer.

[–] BertramDitore@lemm.ee 21 points 1 day ago (2 children)

The article explains the problems in great detail.

Here’s just one small section of the text which describes some of them:

All of this certainly makes knowledge and literature more accessible, but it relies entirely on the people who create that knowledge and literature in the first place—that labor that takes time, expertise, and often money. Worse, generative-AI chatbots are presented as oracles that have “learned” from their training data and often don’t cite sources (or cite imaginary sources). This decontextualizes knowledge, prevents humans from collaborating, and makes it harder for writers and researchers to build a reputation and engage in healthy intellectual debate. Generative-AI companies say that their chatbots will themselves make scientific advancements, but those claims are purely hypothetical.

(I originally put this as a top-level comment, my bad.)

[–] General_Effort@lemmy.world 1 points 6 hours ago

YSK that scientists, engineers, and mathematicians are not paid for the knowledge they create. The knowledge is public domain.

When they publish articles, they typically transfer the copyright to the publisher, which is why they will happily assist you in pirating articles.

Patents are public with the express purpose that others may learn from them. Only the actual use of an invention requires permission. Even that lasts only 20 years rather than 100+ years as is the case with copyrights.

So, this quote is not an explanation of any problems. It is (probably deliberately) misleading. Researchers will not receive any license fees. Rather, these fees will subtract from research budgets.

[–] Enelop@lemm.ee 9 points 23 hours ago (1 children)

The claims aren’t entirely hypothetical, LLM AI has mapped nearly every known protein which humans were unable to do…

https://www.nature.com/articles/d41586-022-02083-2

[–] BertramDitore@lemm.ee 8 points 23 hours ago (1 children)

That’s an interesting article, but it was published in 2022, before LLMs were a thing on anyone’s radar. The results are still incredibly impressive without a doubt, but based on how the researchers explain it, it looks like it was accomplished using deep learning, which isn’t the same as LLMs. Though they’re not entirely unrelated.

Opaque and confusing terminology in this space also just makes it very difficult to determine who or which systems or technology are actually making these advancements. As far as I’m concerned none of this is actual AI, just very powerful algorithmic prediction models. So the claims that an AI system itself has made unique technological advancements, when they are incapable of independent creativity, to me proves that nearly all their touted benefits are still entirely hypothetical right now.

[–] Enelop@lemm.ee 2 points 20 hours ago

I guess that is true.

I hope we are far off from AIG myself. The upheaval it will cause will be catastrophic to society.

[–] br3d@lemmy.world 16 points 1 day ago (2 children)

These authors (and my work is in there) did not write so that Mark Zuckerberg could steal our work and profit from it

[–] MyOpinion@lemm.ee 5 points 23 hours ago

This is just a perfect breakdown of the problem. Keep Zuckerberg away from your work.

[–] Grimy@lemmy.world -1 points 21 hours ago

Those authors aren't in the equation anymore. They gave their work to publishing houses and won't be asked about what it is to be used for.