Technology

75340 readers

4914 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1621

Make illegally trained LLMs public domain as punishment (www.theregister.com)

submitted 9 months ago by Joker@sh.itjust.works to c/technology@lemmy.world

201 comments fedilink hide all child comments

It's all made from our data, anyway, so it should be ours to use as we want

you are viewing a single comment's thread
view the rest of the comments

[–] sem@lemmy.blahaj.zone 2 points 9 months ago (3 children)

The LLM does reproduce copyrighted data though.

[–] ClamDrinker@lemmy.world 4 points 9 months ago* (last edited 9 months ago) (1 children)

Not 1:1, overfitted images still have considerable differences to their original. If you chose "reproduce" to make that point, that's why OP clarified it wasn't literally copying training data, as the actual data being in the model would be a different story. Because these models are (in simplified form) a bunch of really complex math that produces material, it's a mathematical inevitability that it produces copyrighted material, even for calculations that weren't created due to overfitting. Just like infinite monkeys on infinite typewriters will eventually reproduce every piece of copyrighted text.

But then I would point you to the camera on your phone. If you take a copyrighted picture with that, you're still infringing. But was the camera created with the intention to appropriate material captured by the lens? Which is why we don't blame the camera for that, we blame the person that used it for that purpose. AI users have an ethical obligation not to steer the AI towards generating infringing material.

[–] catloaf@lemm.ee 2 points 9 months ago

And the easiest way to do that is to not include infringing material in the first place.

[–] FaceDeer@fedia.io 3 points 9 months ago

How?

[–] desktop_user@lemmy.blahaj.zone 2 points 9 months ago

*it can produce data identical to data that has been copyrighted before