125
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
this post was submitted on 19 Jul 2023
125 points (98.4% liked)
Technology
59467 readers
3508 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related content.
- Be excellent to each another!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, to ask if your bot can be added please contact us.
- Check for duplicates before posting, duplicates may be removed
Approved Bots
founded 1 year ago
MODERATORS
Yes, but it isn't legal to download a repository of pirated books to read and learn from.
Did OpenAI check out the books they trained their model on from the library one at a time?
I'm generally very much against the copyright creep that's being advocated by some trying to get training to be infringement.
But at the same time, OpenAI should have at least needed to buy retail copies of books they were using to train the AI on, or getting access through legal means, and one of the chief allegations against them was that they effectively built the AI on the open seas of piracy by using a data set that contained copyrighted content illegally distributed.
So the overreach by copyright holders to claim rights to training is BS, but there may well still be a valid claim against OpenAI in how they went about it.
Even if I illegally download a book about carpentry, and learn how to build a doghouse from it, and go into the business of building doghouses, the extent of my liability to the copyright holder does not include my entire doghouse-building revenue.
Even if I subsequently teach other people to build doghouses, if I'm not further copying the actual contents of the book, I am not further liable for copyright infringement.
Copyright is actually pretty narrow, and should not be construed to give authors or publishing companies unbridled control over the ideas or knowledge contained in works.
Yes, but you are liable for damages in having pirated it.
Where did I say anything about being liable for future revenue?
But it's a special level of dumb to build a billion dollar company on material that you pirated and can be confirmed to have possessed and used by your end product.
Suits trend to have multiple claims trying to get the plaintiffs as much compensation as possible. Even if all the crap about training as infringement gets thrown out (as it should), claims OpenAI committed one of the largest copyright infringements in recent history by obtaining and using pirated material in violation of copyright law is likely going to have hefty damages attached if it can be proved (which it will be if it happened).
If you downloaded music from Napster and got caught in the early 2000s, did the MPAA fine you got only the retail price of the song?
If you illegally downloaded a book about carpentry, and get caught, do you think you don't have to pay anything for having illegally downloaded it?
Sure, if someone can show that you did.
Based on my own experimentation, ChatGPT knows facts about the Harry Potter novels, but it does not recite the text of them when asked to do so. Does it contain a pirated copy of them? I can't tell. Maybe it just reads a lot of open-source fanfic off AO3.
I'm starting to realize several people in this thread don't understand how subpoenas work.
Sure, but that still doesn't change any of the above statements. If I steal a book from a library, read it... You get the point. All you can get me for is for... What exactly? Cost of the book + maybe a civil penalty? This is going to be a nothing burger for these writers if they're hoping for a payday. Further how do we know what specific repository that the AI got it's content from? It could be that the content it got was from some forum of a person summarizing a chapter + a review for the book + . There's no evidence I've seen thusfar that any of these AI systems are accessing books illegally to begin with. Or that those books were the only source that it derives its responses from.
The AI isn't reproducing the book and thus isn't violating copyright as literally everything it will produce is derivative which is protected. Unless you can get the AI to recite a book back verbatim... Which I've not been successful in doing personally... and I've seen no evidence of anyone else doing either.
Does no one remember the days of Napster and the multiples over retail cost that people caught pirating were charged?
And technically piracy is a federal crime, so there could even be criminal charges.
A "nothing burger"?
Let's see...oh my, what's this? 504.c.2
That's per work infringed.
Nothing burger indeed.
OpenAI is on the other end of over two decades of fearmongering and lobbying to enact laws with ridiculous penalties for piracy in the digital age.
As for how we know where they got the information, that's what subpoenas are for in a legal proceeding. Even if training information is not publicly disclosed, whether they did or didn't pirate content is going to come out privately in court.
The AI doesn't need to reproduce the book for OpenAI to have infringed in illegally sourcing the copyrightable material they used in training.
You failed to read my post. You jumped straight into an assumption that piracy can be proved rather than actually reading what I've posted.
If you're going to continue with strawman arguments then please return to reddit.
Piracy can be proved if it occurred by talking to employees under oath and subpoenaing relevant email records.
The idea the court would need to reverse engineer ChatGPT to find out is absurd.