Technology

75340 readers

4715 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

925

"Did you realize that we live in a reality where SciHub is illegal, and OpenAI is not?" (fosstodon.org)

submitted 2 years ago by Star@sopuli.xyz to c/technology@lemmy.world

155 comments fedilink hide all child comments

you are viewing a single comment's thread
view the rest of the comments

[–] UNWILLING_PARTICIPANT@sh.itjust.works 13 points 2 years ago (1 children)

So it's content laundering

[–] Lemminary@lemmy.world -4 points 2 years ago (2 children)

What a colorful mischaracterization. It sounds clever at face value but it's really naive. If anything about this is deceptive, it's the lengths that people go to to slander what they dislike.

[–] jacksilver@lemmy.world 2 points 2 years ago (1 children)

Actually content laundering is the best term I've heard to describe the process. Just like money laundering, you no longer know the source and know it's technically legal to use and distribute.

I mean, if the copyrighted content wasn't so critical, they would train models without it. Their essentially derivative works, but no one wants to acknowledge it because it would either require changing our copyright laws or make this potentially lucrative and important work illegal.

[–] Lemminary@lemmy.world 4 points 2 years ago

Content laundering is not a good way to describe it because it's misleading as it oversimplifies and mischaracterizes what a language model actually does. It's a fundamental misunderstanding of how it works. Training language models is typically a transparent and well-documented process as described by the mountains of research over the past decades. The real value comes from the weights of the nodes in the neural network and not the source that it spits out in its entirety when it was trained. The source material is evaluated and wholly transformed into new data in the form of nodes and weights. The original content does not exist as it was within the network because there's no way to encode it that way. It's a statistical system that compounds information.

And while LLMs do have the capacity to create derivative works in other ways, it's not all that they do, or what they always do. It's only one of the many functions that it has. What you say would probably be true if it was only trained on a single source, but that's not even feasible. But when you train it on millions of sources, what remains are the overall patterns of language within those works. It's much more sophisticated and flexible than what you describe.

So no, if it was cut and dry there would be grounds for a legitimate lawsuit. The problem is that people are arguing points that do not apply but sound reasonable when they haven't seen a neural network work under the hood. If anything, new laws need to be created to address what LLMs do if you're so concerned about proper compensation.

[–] Jilanico@lemmy.world 2 points 2 years ago (1 children)

I feel most people critical of AI don't know how a neural network works...

[–] Lemminary@lemmy.world -1 points 2 years ago

That is exactly what's going on here. Or they hate it enough that they don't mind making stuff up or mischaracterizing what it does. Seems to be a common thread on the Fediverse. It's not the first time this week I've seen it.