this post was submitted on 09 Jan 2024

469 points (98.0% liked)

Technology

73833 readers

5168 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

469

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says (www.theguardian.com)

submitted 2 years ago by L4s@lemmy.world to c/technology@lemmy.world

245 comments fedilink hide all child comments

‘Impossible’ to create AI tools like ChatGPT without copyrighted material, OpenAI says::Pressure grows on artificial intelligence firms over the content used to train their products

top 50 comments

sorted by: hot top controversial new old

[–] hellothere@sh.itjust.works 166 points 2 years ago (2 children)

OK, so pay for it.

Pretty simple really.

[–] bjoern_tantau@swg-empire.de 124 points 2 years ago (3 children)

Or let's use this opportunity to make copyright much less draconian.

[–] dhork@lemmy.world 79 points 2 years ago* (last edited 2 years ago) (24 children)

¿Porque no los dos?

I don't understand why people are defending AI companies sucking up all human knowledge by saying "well, yeah, copyrights are too long anyway".

Even if we went back to the pre-1976 term of 28 years, renewable once for a total of 56 years, there's still a ton of recent works that AI are using without any compensation to their creators.

I think it's because people are taking this "intelligence" metaphor a bit too far and think if we restrict how the AI uses copyrighted works, that would restrict how humans use them too. But AI isn't human, it's just a glorified search engine. At least all standard search engines do is return a link to the actual content. These AI models chew up the content and spit out something based on it. It simply makes sense that this new process should be licensed separately, and I don't care if it makes some AI companies go bankrupt. Maybe they can work adequate payment for content into their business model going forward.

[–] deweydecibel@lemmy.world 20 points 2 years ago* (last edited 2 years ago) (2 children)

It shouldn't be cheap to absorb and regurgitate the works of humans the world over in an effort to replace those humans and subsequently enrich a handful of silicon valley people.

Like, I don't care what you think about copyright law and how corporations abuse it, AI itself is corporate abuse.

And unlike copyright, which does serve its intended purpose of helping small time creators as much as it helps Disney, the true benefits of AI are overwhelmingly for corporations and investors. If our draconian copyright system is the best tool we have to combat that, good. It's absolutely the lesser of the two evils.

load more comments (2 replies)

load more comments (23 replies)

[–] hellothere@sh.itjust.works 34 points 2 years ago* (last edited 2 years ago) (12 children)

I'm no fan of the current copyright law - the Statute of Anne was much better - but let's not kid ourselves that some of the richest companies in the world have any desire what so ever to change it.

load more comments (12 replies)

[–] Fisk400@feddit.nu 12 points 2 years ago

As long as capitalism exist in society, just being able go yoink and taking everyone's art will never be a practical rule set.

[–] S410@lemmy.ml 11 points 2 years ago (10 children)

Every work is protected by copyright, unless stated otherwise by the author.
If you want to create a capable system, you want real data and you want a wide range of it, including data that is rarely considered to be a protected work, despite being one.
I can guarantee you that you're going to have a pretty hard time finding a dataset with diverse data containing things like napkin doodles or bathroom stall writing that's compiled with permission of every copyright holder involved.

[–] Exatron@lemmy.world 28 points 2 years ago (3 children)

How hard it is doesn't matter. If you can't compensate people for using their work, or excluding work people don't want users, you just don’t get that data.

There's plenty of stuff in the public domain.

load more comments (3 replies)

[–] Fisk400@feddit.nu 22 points 2 years ago

Sounds like a OpenAI problem and not an us problem.

[–] hellothere@sh.itjust.works 17 points 2 years ago* (last edited 2 years ago)

I never said it was going to be easy - and clearly that is why OpenAI didn't bother.

If they want to advocate for changes to copyright law then I'm all ears, but let's not pretend they actually have any interest in that.

load more comments (6 replies)

[–] flop_leash_973@lemmy.world 65 points 2 years ago* (last edited 2 years ago) (9 children)

If it ends up being OK for a company like OpenAI to commit copyright infringement to train their AI models it should be OK for John/Jane Doe to pirate software for private use.

But that would never happen. Almost like the whole of copyright has been perverted into a scam.

load more comments (9 replies)

[–] KingThrillgore@lemmy.ml 53 points 2 years ago (2 children)

Its almost like we had a thing where copyrighted things used to end up but they extended the dates because money

[–] Ultraviolet@lemmy.world 17 points 2 years ago

This is where they have the leverage to push for actual copyright reform, but they won't. Far more profitable to keep the system broken for everyone but have an exemption for AI megacorps.

[–] rivermonster@lemmy.world 17 points 2 years ago

I was literally about to come in here and say it would be an interesting tangential conversation to talk about how FUCKED copyright laws are, and how relevant to the discussion it would be.

More upvote for you!

[–] 800XL@lemmy.world 49 points 2 years ago

I guess the lesson here is pirate everything under the sun and as long as you establish a company and train a bot everything is a-ok. I wish we knew this when everyone was getting dinged for torrenting The Hurt Locker back when.

Remember when the RIAA got caught with pirated mp3s and nothing happened?

What a stupid timeline.

[–] Milk_Sheikh@lemm.ee 43 points 2 years ago (7 children)

Wow! You’re telling me that onerous and crony copyright laws stifle innovation and creativity? Thanks for solving the mystery guys, we never knew that!

load more comments (7 replies)

[–] reverendsteveii@lemm.ee 36 points 2 years ago (1 children)

if it's impossible for you to have something without breaking the law you have to do without it

if it's impossible for the artistocrat class to have something without breaking the law, we change or ignore the law

[–] lolcatnip@reddthat.com 11 points 2 years ago (1 children)

[–] Krauerking@lemy.lol 16 points 2 years ago

Oh sure. But why is it only the massive AI push that allows the large companies owning the models full of stolen materials that make basic forgeries of the stolen items the ones that can ignore the bullshit copyright laws?

It wouldn't be because it is super profitable for multiple large industries right?

[–] dutchkimble@lemy.lol 33 points 2 years ago

Cool, don't do it then

[–] ook_the_librarian@lemmy.world 27 points 2 years ago (3 children)

It's not "impossible". It's expensive and will take years to produce material under an encompassing license in the quantity needed to make the model "large". Their argument is basically "but we can have it quickly if you allow legal shortcuts."

[–] Patches@sh.itjust.works 11 points 2 years ago

That argument has unfortunately worked for many other Tech Bros

load more comments (2 replies)

[–] kibiz0r@lemmy.world 24 points 2 years ago (4 children)

I'm dumbfounded that any Lemmy user supports OpenAI in this.

We're mostly refugees from Reddit, right?

Reddit invited us to make stuff and share it with our peers, and that was great. Some posts were just links to the content's real home: Youtube, a random Wordpress blog, a Github project, or whatever. The post text, the comments, and the replies only lived on Reddit. That wasn't a huge problem, because that's the part that was specific to Reddit. And besides, there were plenty of third-party apps to interact with those bits of content however you wanted to.

But as Reddit started to dominate Google search results, it displaced results that might have linked to the "real home" of that content. And Reddit realized a tremendous opportunity: They now had a chokehold on not just user comments and text posts, but anything that people dare to promote online.

At the same time, Reddit slowly moved from a place where something may get posted by the author of the original thing to a place where you'll only see the post if it came from a high-karma user or bot. Mutated or distorted copies of the original instance, reformated to cut through the noise and gain the favor of the algorithm. Re-posts of re-posts, with no reference back to the original, divorced of whatever context or commentary the original creator may have provided. No way for the audience to respond to the author in any meaningful way and start a dialogue.

This is a miniature preview of the future brought to you by LLM vendors. A monetized portal to a dead internet. A one-way street. An incestuous ouroborous of re-posts of re-posts. Automated remixes of automated remixes.

There are genuine problems with copyright law. Don't get me wrong. Perhaps the most glaring problem is the fact that many prominent creators don't even own the copyright to the stuff they make. It was invented to protect creators, but in practice this "protection" gets assigned to a publisher immediately after the protected work comes into being.

And then that copyright -- the very same thing that was intended to protect creators -- is used as a weapon against the creator and against their audience. Publishers insert a copyright chokepoint in-between the two, and they squeeze as hard as they desire, wringing it of every drop of profit, keeping creators and audiences far away from each other. Creators can't speak out of turn. Fans can't remix their favorite content and share it back to the community.

This is a dysfunctional system. Audiences are denied the ability to access information or participate in culture if they can't pay for admission. Creators are underpaid, and their creative ambitions are redirected to what's popular. We end up with an auto-tuned culture -- insular, uncritical, and predictable. Creativity reduced to a product.

But.

If the problem is that copyright law has severed the connection between creator and audience in order to set up a toll booth along the way, then we won't solve it by giving OpenAI a free pass to do the exact same thing at massive scale.

load more comments (4 replies)

[–] whoisearth@lemmy.ca 23 points 2 years ago (6 children)

If OpenAI is right (I think they are) one of two things need to happen.

All AI should be open source and non-profit
Copywrite law needs to be abolished

For number 1. Good luck for all the reasons we all know. Capitalism must continue to operate.

For number 1. Good luck because those in power are mostly there off the backs of those before them (see Disney, Apple, Microsoft, etc)

Anyways, fun to watch play out.

[–] SCB@lemmy.world 13 points 2 years ago* (last edited 2 years ago) (12 children)

There's a third solution you're overlooking.

3: OpenAI (or other) wins a judgment that AI content is not inherently a violation of copyright regardless of materials it is trained upon.

load more comments (12 replies)

load more comments (5 replies)

[–] dasgoat@lemmy.world 22 points 2 years ago (1 children)

Cool! Then don't!

[–] NeatNit@discuss.tchncs.de 26 points 2 years ago (7 children)

hijacking this comment

OpenAI was IMHO well within its rights to use copyrighted materials when it was just doing research. They were* doing research on how far large language models can be pushed, where's the ceiling for that. It's genuinely good research, and if copyrighted works are used just to research and what gets published is the findings of the experiments, that's perfectly okay in my book - and, I think, in the law as well. In this case, the LLM is an intermediate step, and the published research papers are the "product".

The unacceptable turning point is when they took all the intermediate results of that research and flipped them into a product. That's not the same, and most or all of us here can agree - this isn't okay, and it's probably illegal.

* disclaimer: I'm half-remembering things I've heard a long time ago, so even if I phrase things definitively I might be wrong

load more comments (7 replies)

[–] S410@lemmy.ml 21 points 2 years ago (2 children)

They're not wrong, though?

Almost all information that currently exists has been created in the last century or so. Only a fraction of all that information is available to be legally acquired for use and only a fraction of that already small fraction has been explicitly licensed using permissive licenses.

Things that we don't even think about as "protected works" are in fact just that. Doesn't matter what it is: napkin doodles, writings on bathrooms stall walls, letters written to friends and family. All of those things are protected, unless stated otherwise. And, I don't know about you, but I've never seen a license notice attached to a napkin doodle.

Now, imagine trying to raise a child while avoiding every piece of information like that; information that you aren't licensed to use. You wouldn't end up with a person well suited to exist in the world. They'd lack education regarding science, technology, they'd lack understanding of pop-culture, they'd know no brand names, etc.

Machine learning models are similar. You can train them that way, sure, but they'd be basically useless for real-world applications.

[–] AntY@lemmy.world 48 points 2 years ago (14 children)

The main difference between the two in your analogy, that has great bearing on this particular problem, is that the machine learning model is a product that is to be monetized.

load more comments (14 replies)

[–] Exatron@lemmy.world 11 points 2 years ago (19 children)

The difference here is that a child can't absorb and suddenly use massive amounts of data.

load more comments (19 replies)

[–] wosat@lemmy.world 20 points 2 years ago (1 children)

This situation seems analogous to when air travel started to take off (pun intended) and existing legal notions of property rights had to be adjusted. IIRC, a farmer sued an airline for trespassing because they were flying over his land. The court ruled against the farmer because to do otherwise would have killed the airline industry.

load more comments (1 replies)

[–] Chee_Koala@lemmy.world 19 points 2 years ago (2 children)

But our current copyright model is so robust and fair! They will only have to wait 95y after the author died, which is a completely normal period.

If you want to control your creations, you are completely free to NOT publish it. Nowhere it's stated that to be valuable or beautiful, it has to be shared on the world podium.

We'll have a very restrictive Copyright for non globally transmitted/published works, and one for where the owner of the copyright DID choose to broadcast those works globally. They have a couple years to cash in, and then after I dunno, 5 years, we can all use the work as we see fit. If you use mass media to broadcast creative works but then become mad when the public transforms or remixes your work, you are part of the problem.

Current copyright is just a tool for folks with power to control that power. It's what a boomer would make driving their tractor / SUV while chanting to themselves: I have earned this.

[–] LWD@lemm.ee 34 points 2 years ago* (last edited 2 years ago) (5 children)

deleted

[–] just_change_it@lemmy.world 13 points 2 years ago (6 children)

I think it's pretty amazing when people just run with the dogma that empowers billionaires.

Every creator hopes they'll be the next taylor swift and that they'll retain control of their art for those life + 70 years and make enough to create their own little dynasty.

The reality is that long duration copyright is almost exclusively a tool of the already wealthy, not a tool for the not-yet-wealthy. As technology improves it will be easier and easier for wealth to control the system and deny the little guy's copyright on grounds that you used something from their vast portfolio of copyright/patent/trademark/ipmonopolyrulelegalbullshit. Already civil legal disputes are largely a function of who has the most money.

I don't have the solution that helps artists earn a living, but it doesn't seem like copyright is doing them many favors as-is unless they are retired rockstars who have already earned in excess of the typical middle class lifetime earnings by the time they hit 35, or way earlier.

load more comments (6 replies)

load more comments (4 replies)

load more comments (1 replies)

[–] CosmoNova@lemmy.world 18 points 2 years ago* (last edited 2 years ago)

Let's wait until everyone is laid off and it's 'impossible' to get by without mass looting then, shall we?

[–] Evotech@lemmy.world 18 points 2 years ago* (last edited 2 years ago) (7 children)

Then LLMs should be FOSS

load more comments (7 replies)

[–] Blackmist@feddit.uk 17 points 2 years ago (3 children)

Maybe you shouldn't have done it then.

I can't make a Jellyfin server full of content without copyrighted material either, but the key difference here is I'm not then trying to sell that to investors.

load more comments (3 replies)

[–] holycrap@lemm.ee 16 points 2 years ago

I have the perfect solution. Shorten the copyright duration.

[–] Boiglenoight@lemmy.world 12 points 2 years ago* (last edited 2 years ago) (1 children)

Piracy by another name. Copyrighted materials are being used for profit by companies that have no intention of compensating the copyright holder.

load more comments (1 replies)

[–] bravesirrbn@lemmy.world 11 points 2 years ago

Then don't

[–] afraid_of_zombies@lemmy.world 11 points 2 years ago

If the copyright people had their way we wouldn't be able to write a single word without paying them. This whole thing is clearly a fucking money grab. It is not struggling artists being wiped out, it is big corporations suing a well funded startup.

[–] positiveWHAT@lemmy.world 10 points 2 years ago

Is this the point where we start UBI and start restructuring society for the future of AI?

load more comments