259
submitted 8 months ago by sculd@beehaw.org to c/technology@beehaw.org

Apparently, stealing other people's work to create product for money is now "fair use" as according to OpenAI because they are "innovating" (stealing). Yeah. Move fast and break things, huh?

"Because copyright today covers virtually every sort of human expression—including blogposts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials," wrote OpenAI in the House of Lords submission.

OpenAI claimed that the authors in that lawsuit "misconceive[d] the scope of copyright, failing to take into account the limitations and exceptions (including fair use) that properly leave room for innovations like the large language models now at the forefront of artificial intelligence."

you are viewing a single comment's thread
view the rest of the comments
[-] Haus@kbin.social 75 points 8 months ago

Try to train a human comedian to make jokes without ever allowing him to hear another comedian's jokes, never watching a movie, never reading a book or magazine, never watching a TV show. I expect the jokes would be pretty weak.

[-] Phanatik@kbin.social 121 points 8 months ago

A comedian isn't forming a sentence based on what the most probable word is going to appear after the previous one. This is such a bullshit argument that reduces human competency to "monkey see thing to draw thing" and completely overlooks the craft and intent behind creative works. Do you know why ChatGPT uses certain words over others? Probability. It decided as a result of its training that one word would appear after the previous in certain contexts. It absolutely doesn't take into account things like "maybe this word would be better here because the sound and syllables maintains the flow of the sentence".

Baffling takes from people who don't know what they're talking about.

[-] frog@beehaw.org 67 points 8 months ago* (last edited 8 months ago)

I wish I could upvote this more than once.

What people always seem to miss is that a human doesn't need billions of examples to be able to produce something that's kind of "eh, close enough". Artists don't look at billions of paintings. They look at a few, but do so deeply, absorbing not just the most likely distribution of brushstrokes, but why the painting looks the way it does. For a basis of comparison, I did an art and design course last year and looked at about 300 artworks in total (course requirement was 50-100). The research component on my design-related degree course is one page a week per module (so basically one example from the field the module is about, plus some analysis). The real bulk of the work humans do isn't looking at billions of examples: it's looking at a few, and then practicing the skill and developing a process that allows them to convey the thing they're trying to express.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

[-] Phanatik@kbin.social 26 points 8 months ago

Exactly! You can glean so much from a single work, not just about the work itself but who created it and what ideas were they trying to express and what does that tell us about the world they live in and how they see that world.

This doesn't even touch the fact that I'm learning to draw not by looking at other drawings but what exactly I'm trying to draw. I know at a base level, a drawing is a series of shapes made by hand whether it's through a digital medium or traditional pen/pencil and paper. But the skill isn't being able replicate other drawings, it's being able to convert something I can see into a drawing. If I'm drawing someone sitting in a wheelchair, then I'll get the pose of them sitting in the wheelchair but I can add details I want to emphasise or remove details I don't want. There's so much that goes into creative work and I'm tired of arguing with people who have no idea what it takes to produce creative works.

[-] frog@beehaw.org 27 points 8 months ago

It seems that most of the people who think what humans and AIs do is the same thing are not actually creatives themselves. Their level of understanding of what it takes to draw goes no further than "well anyone can draw, children do it all the time". They have the same respect for writing, of course, equating the ability to string words together to write an email, with the process it takes to write a brilliant novel or script. They don't get it, and to an extent, that's fine - not everybody needs to understand everything. But they should at least have the decency to listen to the people that do get it.

load more comments (12 replies)
load more comments (1 replies)
[-] Marsupial@quokk.au 10 points 8 months ago* (last edited 8 months ago)

Children learn by watching others. We are trained from millions of examples starting from before birth.

[-] Even_Adder@lemmy.dbzer0.com 5 points 8 months ago

When people say that the "model is learning from its training data", it means just that, not that it is human, and not that it learns exactly humans. It doesn't make sense to judge boats on how well they simulate human swimming patterns, just how well they perform their task.

Every human has the benefit of as a baby training on things around them and being trained by those around them, building a foundation for all later skills. Generative models rely on many text and image pairs to describe things to them because they lack the ability to poke, prod, rotate, and disassemble for themselves.

For example, when a model takes in a thousand images of circles, it doesn't "learn" a thousand circles. It learns what circle GENERALLY is like, the concept of it. That representation, along with random noise, is how you create images with them. The same happens for every concept the model trains on. Everything from "cat" to more complex things like color relationships and reflections or lighting. Machines are not human, but they can learn despite that.

[-] ParsnipWitch@feddit.de 4 points 8 months ago* (last edited 8 months ago)

In general I agree with you, but AI doesn't learn the concept of what a circle is. AI reproduces the most fitting representation of what we call a circle. But there is no understanding of the concept of a circle. This may sound nit picking, but I think it's important to make the distinction.

That is why current models aren't regarded as actual intelligence, although people already call them that...

load more comments (1 replies)
[-] Eccitaze@yiffit.net 3 points 8 months ago

It makes sense to judge how closely LLMs mimic human learning when people are using it as a defense to AI companies scraping copyrighted content, and making the claim that banning AI scraping is as nonsensical as banning human learning.

But when it's pointed out that LLMs don't learn very similarly to humans, and require scraping far more material than a human does, suddenly AIs shouldn't be judged by human standards? I don't know if it's intentional on your part, but that's a pretty classic example of a motte-and-bailey fallacy. You can't have it both ways.

load more comments (1 replies)
[-] teawrecks@sopuli.xyz 4 points 8 months ago

What you count as "one" example is arbitrary. In terms of pixels, you're looking at millions right now.

The ability to train faster using fewer examples in real time, similar to what an intelligent human brain can do, is definitely a goal of AI research. But right now, we may be seeing from AI what a below average human brain could accomplish with hundreds of lifetimes to study.

If the AI models were really doing exactly the same thing humans do, the models could be trained without any copyright infringement at all, because all of the public domain and creative commons content, plus maybe licencing a little more, would be more than enough.

I mean, no, if you only ever look at public domain stuff you literally wouldn't know the state of the art, which is historically happening for profit. Even the most untrained artist "doing their own thing" watches Disney/Pixar movies and listens to copyrighted music.

[-] frog@beehaw.org 10 points 8 months ago

If we're going by the number of pixels being viewed, then you have to use the same measure for both humans and AIs - and because AIs have to look at billions of images while humans do not, the AI still requires far more pixels than a human does.

And humans don't require the most modern art in order to learn to draw at all. Sure, if they want to compete with modern artists, they would need to look at modern artists (for which educational fair use exists, and again the quantity of art being used by the human for this purpose is massively lower than what an AI uses - a human does not need to consume billions of artworks from modern artists in order to learn what the current trends are). But a human could learn to draw, paint, sculpt, etc purely by only looking at public domain and creative commons works, because the process for drawing, say, the human figure (with the right number of fingers!) has not changed in hundreds of years. A human can also just... go outside and draw things they see themselves, because the sky above them and the tree across the street aren't copyrighted. And in fact, I'd argue that a good artist should go out and find real things to draw.

OpenAI's argument is literally that their AI cannot learn without using copyrighted materials in vast quantities - too vast for them to simply compensate all the creators. So it genuinely is not comparable to a human, because humans can, in fact, learn without using copyrighted material. If OpenAI's argument is actually that their AI can't compete commercially with modern art without using copyrighted works, then they should be honest about that - but then they'd be showing their hand, wouldn't they?

load more comments (9 replies)
[-] Bene7rddso@feddit.de 5 points 8 months ago

Humans learn mostly from real life. Go touch some grass

[-] intensely_human@lemm.ee 2 points 8 months ago

When you look at one painting, is that the equivalent of one instance of the painting in the training data? There is an infinite amount of information in the painting, and each time you look you process more of that information.

I’d say any given painting you look at in a museum, you process at least a hundred mental images of aspects of it. A painting on your wall could be seen ten thousand times easily.

[-] DaDragon@kbin.social 22 points 8 months ago

That’s what humans do, though. Maybe not probability directly, but we all know that some words should be put in a certain order. We still operate within standard norms that apply to aparte group of people. LLM’s just go about it in a different way, but they achieve the same general result. If I’m drawing a human, that means there’s a ‘hand’ here, and a ‘head’ there. ‘Head’ is a weird combination of pixels that mostly look like this, ‘hand’ looks kinda like that. All depends on how the model is structured, but tell me that’s not very similar to a simplified version of how humans operate.

[-] Phanatik@kbin.social 21 points 8 months ago

Yeah but the difference is we still choose our words. We can still alter sentences on the fly. I can think of a sentence and understand verbs go after the subject but I still have the cognition to alter the sentence to have the effect I want. The thing lacking in LLMs is intent and I'm yet to see anyone tell me why a generative model decides to have more than 6 fingers. As humans we know hands generally have five fingers and there's a group of people who don't so unless we wanted to draw a person with a different number of fingers, we could. A generative art model can't help itself from drawing multiple fingers because all it understands is that "finger + finger = hand" but it has no concept on when to stop.

[-] DaDragon@kbin.social 9 points 8 months ago

And that’s the reason why LLM generated content isn’t considered creative.

I do believe that the person using the device has a right to copyright the unique method they used to generate the content, but the content itself isn’t anything worth protecting.

[-] Phanatik@kbin.social 15 points 8 months ago

You say that yet I initially responded to someone who was comparing an LLM to what a comedian does.

There is no unique method because there's hardly anything unique you can do. Two people using Stable Diffusion to produce an image are putting in the same amount of work. One might put more time into crafting the right prompt but that's not work you're doing.

If 90% of the work is handled by the model, and you just layer on whatever extra thing you wanted, that doesn't mean you created the thing. That also implies you have much control over the output. You're effectively negotiating with this machine to produce what you want.

[-] DaDragon@kbin.social 4 points 8 months ago

Wouldn’t that lead to the same argument as originally brought against photography, though?

A photographer is effectively negotiating with the sun, the sky and everything else to hopefully get the result they are looking for on their device.

[-] Phanatik@kbin.social 10 points 8 months ago

One difference is that the photographer has to go the places they're taking pictures of.

Another is that photography isn't comparable to paintings and it never has been. I'm willing to bet photography and paintings have never coexisted in a contest. Except, when people say their generative art is comparable to what artists have been producing by hand, they are admitting that generative art has more in common with photography than it does with hand-crafted art but they want the prestige and recognition those artists get for their work.

load more comments (1 replies)
[-] Nyfure@kbin.social 3 points 8 months ago

more time into crafting the right prompt

Thats not work to you? My company pays me to spend time to do the right thing, even though most of the work does the computer.

I see where you are going at, but your argument also invalidates other forms of human interaction and creating.

In my country copyright can only be granted if a certain amount of (human) work went into something. Any work.
The difficult part is finding out whats enough and what kind of work qualify to lead to some kind of protection, even if partial.
The difficult part was not to create something, but to prove someone did or didnt put enough work into it.
I think we can hold generated or assisted goods to the same standard.

Putting a simple prompt together should probably not be granted protection as no significant work went into it. But refining it, editing the result.. maybe thats enough, thats really up to the society to decide.

At the same time we have to balance the power of machines against human work, so the human work doesnt get totally invalidated, but rather shifted and treated as sub-type.
Machines already replaced alot of work, also creative ones. Book-printing, forging, producing food.. the scary part about generative AI is mainly the speed of them spreading.

[-] Phanatik@kbin.social 12 points 8 months ago

So as a data analyst a lot of my work is done through a computer but I can apply my same skills if someone hands me a piece of paper with data printed on it and told me to come up with solutions to the problems with it. I don't need the computer to do what I need to do, it makes it easier to manipulate data but the degree of problem solving required needs to be done by a human and that's why it's my job. If a machine could do it, then they would be doing it but they aren't because contrary to what people believe about data analysis, you have to be somewhat creative to do it well.

Crafting a prompt is an exercise in trial and error. It's work but it's not skilled work. It doesn't take talent or practice to do. Despite the prompt, you are still at the mercy of the machine.

Even by the case you've presented, I have to ask, at what point of a human editing the output of a generative model constitutes it being your own work and not the machine's? How much do you have to change? Can you give me a %?

Machines were intended to automate the tedious tasks that we all have to suffer to free up our brains for more engaging things which might include creative pursuits. Automation exists to make your life easier, not to rob you of life's pursuits or your livelihood. It never should've been used to produce creative work and I find the attempts to equate this abomination's outputs to what artists have been doing for years, utterly deplorable.

[-] intensely_human@lemm.ee 4 points 8 months ago

I don’t choose my words man. I get a vague sense of the meaning I want to convey and the words just form themselves.

[-] ParsnipWitch@feddit.de 4 points 8 months ago

As an artist you draw with an understanding of the human body, though. An understanding current models don't have because they aren't actually intelligent.

Maybe when a human is an absolute beginner in drawing they will think about the different lines and replicate even how other people draw stuff that then looks like a hand.

But eventually they will realise (hopefully, otherwise they may get frustrated and stop drawing) that you need to understand the hand to draw one. It's mass, it's concept or the idea of what a hand is.

This may sound very abstract and strange but creative expression is more complex than replicating what we have seen a million times. It's a complex function unique to the human brain, an organ we don't even scientifically understand yet.

[-] hascat@programming.dev 8 points 8 months ago

That's not the point though. The point is that the human comedian and the AI both benefit from consuming creative works covered by copyright.

[-] Phanatik@kbin.social 14 points 8 months ago

Yeah except a machine is owned by a company and doesn't consume the same way. It breaks down copyrighted works into data points so it can find the best way of putting those data points together again. If you understand anything at all about how these models work, they do not consume media the same way we do. It is not an entity with a thought process or consciousness (despite the misleading marketing of "AI" would have you believe), it's an optimisation algorithm.

[-] chahk@beehaw.org 17 points 8 months ago

It's a glorified autocomplete.

[-] Phanatik@kbin.social 5 points 8 months ago

It's so funny that this is something new. This was Grammarly's whole schtick since before ChatGPT so how different is Grammarly AI?

[-] vexikron@lemmy.zip 5 points 8 months ago

Here is the bigger picture: The vast majority of tech illiterate people think something is AI because duh its called AI.

Its literally just the power of branding and marketing on the minds of poorly informed humans.

Unfortunately this is essentially a reverse Turing Test.

The vast majority of humans do not know anything about AI, and also a huge majority of them can also barely tell the difference between, currently in some but not all forms, output from what is basically a brute force total internet plagiarism and synthesis software, from many actual human created content in many cases.

To me this basically just means that about 99% of the time, most humans are actually literally NPCs, and they only do actual creative and unpredictable things very very rarely.

load more comments (4 replies)
[-] vexikron@lemmy.zip 12 points 8 months ago* (last edited 8 months ago)

And human comedians regularly get called out when they outright steal others material and present it as their own.

The word for this is plagiarism.

And in OpenAIs framework, when used in a relevant commercial context, they are functionally operating and profiting off of the worlds most comprehensive plagiarism software.

load more comments (1 replies)
[-] teawrecks@sopuli.xyz 6 points 8 months ago

A comedian isn't forming a sentence based on what the most probable word is going to appear after the previous one.

Neither is an LLM. What you're describing is a primitive Markov chain.

You may not like it, but brains really are just glorified pattern recognition and generation machines. So yes, "monkey see thing to draw thing", except a really complicated version of that.

Think of it this way: if your brain wasn't a reorganization and regurgitation of the things you have observed before, it would just generate random noise. There's no such thing as "truly original" art or it would be random noise. Every single word either of us is typing is the direct result of everything you and I have observed before this moment.

Baffling takes from people who don't know what they're talking about.

Ironic, to say the least.

The point you should be making, is that a corporation will make this above argument up to, but not including the point where they have to treat AIs ethically. So that's the way to beat them. If they're going to argue that they have created something that learns and creates content like a human brain, then they should need to treat it like a human, ensure it is well compensated, ensure it isn't being overworked or enslaved, ensure it is being treated "humanely". If they don't want to do that, if they want it to just be a well built machine, then they need to license all the proprietary data they used to build it. Make them pick a lane.

[-] Phanatik@kbin.social 4 points 8 months ago

Neither is an LLM. What you’re describing is a primitive Markov chain.

My description might've been indicative of a Markov chain but the actual framework uses matrices because you need to be able to store and compute a huge amount of information at once which is what matrices are good for. Used in animation if you didn't know.

What it actually uses is irrelevant, how it uses those things is the same as a regression model, the difference is scale. A regression model looks at how related variables are in giving an outcome and computing weights to give you the best outcome. This was the machine learning boom a couple of years ago and TensorFlow became really popular.

LLMs are an evolution of the same idea. I'm not saying it's not impressive because it's very cool what they were able to do. What I take issue with is the branding, the marketing and the plagiarism. I happen to be in the intersection of working in the same field, an avid fan of classic Sci-Fi and a writer.

It's easy to look at what people have created throughout history and think "this looks like that" and on a point by point basis you'd be correct but the creation of that thing is shaped by the lens of the person creating it. Someone might make a George Carlin joke that we've heard recently but we'll read about it in newspapers from 200 years ago. Did George Carlin steal the idea? No. Was he aware of that information? I don't know. But Carlin regularly calls upon his own experiences so it's likely that he's referencing a event from his past that is similar to that of 200 years ago. He might've subconsciously absorbed the information.

The point is that the way these models have been trained is unethical. They used material they had no license to use and they've admitted that it couldn't work as well as it does without stealing other people's work. I don't think they're taking the position that it's intelligent because from the beginning that was a marketing ploy. They're taking the position that they should be allowed to use the data they stole because there was no other way.

load more comments (2 replies)
[-] pupbiru@aussie.zone 2 points 8 months ago

you know how the neurons in our brain work, right?

because if not, well, it’s pretty similar… unless you say there’s a soul (in which case we can’t really have a conversation based on fact alone), we’re just big ol’ probability machines with tuned weights based on past experiences too

[-] Phanatik@kbin.social 5 points 8 months ago

You are spitting out basic points and attempting to draw similarities because our brains are capable of something similar. The difference between what you've said and what LLMs do is that we have experiences that we are able to glean a variety of information from. An LLM sees text and all it's designed to do is say "x is more likely to appear before y than z". If you fed it nonsense, it would regurgitate nonsense. If you feed it text from racist sites, it will regurgitate that same language because that's all it has seen.

You'll read this and think "that's what humans do too, right?" Wrong. A human can be fed these things and still reject them. Someone else in this thread has made some good points regarding this but I'll state them here as well. An LLM will tell you information but it has no cognition on what it's telling you. It has no idea that it's right or wrong, it's job is to convince you that it's right because that's the success state. If you tell it it's wrong, that's a failure state. The more you speak with it, the more fail states it accumulates and the more likely it is to cutoff communication because it's not reaching a success, it's not giving you what you want. The longer the conversation goes on, the more crazy LLMs get as well because it's too much to process at once, holding those contexts in its memory while trying to predict the next one. Our brains do this easily and so much more. To claim an LLM is intelligent is incredibly misguided, it is merely the imitation of intelligence.

load more comments (1 replies)
load more comments (2 replies)
[-] tryptaminev@feddit.de 2 points 8 months ago* (last edited 8 months ago)

You do know that comedians are copying each others material all the time though? Either making the same joke, or slightly adapting it.

So in the context of copyright vs. model training i fail to see how the exact process of the model is relevant? At the end copyrighted material goes in and material based on that copyrighted material goes out.

load more comments (5 replies)
[-] luciole@beehaw.org 21 points 8 months ago* (last edited 8 months ago)

There's this linguistic problem where one word is used for two different things, it becomes difficult to tell them apart. "Training" or "learning" is a very poor choice of word to describe the calibration of a neural network. The actor and action are both fundamentally different from the accepted meaning. To start with, human learning is active whereas machining learning is strictly passive: it's something done by someone with the machine as a tool. Teachers know very well that's not how it happens with humans.

When I compare training a neural network with how I trained to play clarinet, I fail to see any parallel. The two are about as close as a horse and a seahorse.

load more comments (2 replies)
[-] Powderhorn@beehaw.org 12 points 8 months ago

A comedian walks on stage and says, "Why is there a mic here?"

[-] sculd@beehaw.org 10 points 8 months ago

AIs are not humans. Humans cannot read millions of texts in seconds and cannot split out millions of output at the same time.

load more comments (2 replies)
this post was submitted on 11 Jan 2024
259 points (99.6% liked)

Technology

37603 readers
548 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 2 years ago
MODERATORS