this post was submitted on 09 Aug 2023

379 points (100.0% liked)

Technology

40401 readers

362 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:

This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago

MODERATORS

TheRtRevKaiser@beehaw.org

alyaza@beehaw.org

gyrfalcon@beehaw.org

SemioticStandard@beehaw.org

coldredlight@beehaw.org

rs5th@beehaw.org

TheRtRevKaiser@kbin.social

remington@beehaw.org

379

Google says AI systems should be able to mine publishers’ work unless companies opt out, turning copyright law on its head (www.theguardian.com)

submitted 2 years ago by 0x815@feddit.de to c/technology@beehaw.org

176 comments fedilink hide all child comments

In its submission to the Australian government’s review of the regulatory framework around AI, Google said that copyright law should be altered to allow for generative AI systems to scrape the internet.

you are viewing a single comment's thread
view the rest of the comments

[–] frog@beehaw.org 31 points 2 years ago (3 children)

I think the key problem with a lot of the models right now is that they were developed for "research", without the rights holders having the option to opt out when the models were switched to for-profit. The portfolio and gallery websites, from which the bulk of the artwork came from, didn't even have opt out options until a couple of months ago. Artists were therefore considered to have opted in to their work being used commercially because they were never presented with the option to opt out.

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project. I would favour a complete rebuild of the models that only draws from works that are either in the public domain or whose rights holders have explicitly opted in to their work being used for commercial models.

Basically, you can't deny rights' holders an ability to opt out, and then say "hey, it's not our fault that you didn't opt out, now we can use your stuff to profit ourselves".

[–] tochee@aussie.zone 8 points 2 years ago (1 children)

Common sense would surely say that becoming a for-profit company or whatever they did would mean they've breached that law. I assume they figured out a way around it or I've misunderstood something though.

[–] frog@beehaw.org 10 points 2 years ago

I think they just blatantly ignored the law, to be honest. The UK's copyright law is similar, where "fair dealing" allows use for research purposes (legal when the data scrapes were for research), but fair dealing explicitly does not apply when the purpose is commercial in nature and intended to compete with the rights holder. The common sense interpretation is that as soon as the AI models became commercial and were being promoted as a replacement for human-made work, they were intended to be a for profit competition to the rights holders.

If we get to a point where opt outs have full legal weight, I still expect the AI companies to use the data "for research" and then ship the model as a commercial enterprise without any attempt to strip out the works that were only valid to use for research.

[–] Pseu@beehaw.org 4 points 2 years ago (1 children)

So at the bare minimum, a mechanism needs to be provided for retroactively removing works that would have been opted out of commercial usage if the option had been available and the rights holders had been informed about the commercial intentions of the project.

If you do this, you limit access to AI tools exclusively to big companies. They already employ enough artists to create a useful AI generator, they'll simply add that the artist agrees for their work to be used in training to the employment contract. After a while, the only people who have access to reasonably good AI is are those major corporations, and they'll leverage that to depress wages and control employees.

The WGA's idea that the direct output of an AI is uncopyrightable doesn't distort things so heavily in favor of Disney and Hasbro. It's also more legally actionable. You don't name Microsoft Word as the editor of a novel because you used spell check even if it corrected the spelling and grammar of every word. Naturally you don't name generative AI as an author or creator.

Though the above argument only really applies when you have strong unions willing to fight for workers, and with how gutted they are in the US, I don't think that will be the standard.

[–] frog@beehaw.org 10 points 2 years ago* (last edited 2 years ago) (1 children)

The solution to only big companies having access to AI by using enough artists to create a useful generator isn't to deny all artists globally any ability to control their work, though. If all works can be scraped and added to commercial AI models without any payment to artists, you completely obliterate all artists except for the small handful working for Disney, Hasbro, and the likes.

AI models actually require a constant input of new human-made artworks, because they cannot create anything new or unique themselves, and feeding an AI content produced by AI ends up with very distorted results pretty quickly. So it's simply not viable to expect the 99% of artists who don't work for big companies to continuously provide new works for AI models, for free, so that others can profit from them. Therefore, artists need either the ability to opt out or they need to be paid.

(The word "artist" here is used to refer to everyone in the creative industries. Writing and music are art just like paintings and drawings are.)

[–] Pseu@beehaw.org 2 points 2 years ago (1 children)

Unfortunately, copyright protection doesn't extend that far. AI training is almost certainly fair use if it is copying at all. Styles and the like cannot be copyrighted, so even if an AI creates a work in the style of someone else, it is extremely unlikely that the output would be so similar as to be in violation of copyright. Though I do feel that it is unethical to intentionally try to reproduce someone's style, especially if you're doing it for commercial gain. But that is not illegal unless you try to say that you are that artist.

[–] frog@beehaw.org 5 points 2 years ago (1 children)

Copyright law on this varies, actually! In the UK, "fair dealing" actually has an exclusion for using copyrighted material for the purpose of commercially competing with the creator. This also includes derivative works. This does therefore cover style to a certain extent, because works imitating a style of an artist are generally intended to commercially compete with them. From that perspective, taking an artist's entire portfolio, feeding it into an AI, and producing work in their style at a lower price than the artist does (because an AI produces something in seconds which takes the artist weeks), is pretty obviously an attempt to compete with the artist commercially.

While people like to draw comparisons between AIs and humans copying another artist's style, the big difference here is that a human artist needs to spend hundreds of hours learning to imitate another artist's style, at the expense of developing their own style, while the original artist is also continually developing their style. It is bloody hard to imitate another human's art style. But an AI can do it in minutes, and I haven't yet seen any valid arguments for how that's not intended to commercially compete with human artists on a massive scale.

[–] Pseu@beehaw.org 2 points 2 years ago (1 children)

True, I wrote this from a US law perspective, where that kind of behavior is expressly protected. US law is also written specifically to protect things like search engines and aggregators to prevent services like Google from getting sued for their blurbs, but it's likely also a defense for AI.

Regardless of if it should be illegal or not, I feel that AI training and use is currently legal under current US law. And as a US company, dragging OpenAI to UK courts and extracting payment from them would be difficult for all but the most monied artists.

[–] frog@beehaw.org 3 points 2 years ago

For the moment, US companies do actually care what the UK courts and regulatory bodies say, because the trifecta of US-UK-EU is what tends to form a base of what the rest of the world decides. It's why Microsoft have been so unhappy about the UK's Competition and Markets Authority initially blocking the merger with Blizzard: even with the US and EU antitrust bodies agreeing to it, it did actually matter if the UK didn't agree (I am so disappointed in the CMA finally capitulating). And some of the lawsuits against the AI companies are taking place in the UK courts, with no indications that the AI companies are refusing to engage. Obviously at this point it's hard to say what the outcome will be, but the UK legal system does actually have enough clout globally that it won't be a meaningless result.

[–] P1r4nha@feddit.de 2 points 2 years ago (1 children)

Practically you would have to separate model architecture from weights. Weights are licensed as research use only, while the architecture is the actual scientific contribution. Maybe some instructions on best train the model.

Only problem is that you can't really prove if someone just retrained research weights or trained from scratch using randomized weights. Also certain alterations to the architecture are possible, so only the "headless" models are used.

I think there's some research into detecting retraining, but I can imagine it's not fool proof.

[–] frog@beehaw.org 3 points 2 years ago

I kind of think that as proof-of-concepts, the AI models are kind of interesting. I don't like the content they produce much, because it is just so utterly same-y, so I haven't yet seen anything that made me go "wow, that's amazing". But the actual architecture behind them is pretty cool.

But at this point, they've gone beyond researching an interesting idea into full on commercial enterprises. If we don't have an effective means of retraining the existing models to remove the data that isn't licenced for commercial use (which is most of it), then it seems the only ethical way to move forward would be to start again with more selective training data, including only what is commercially licenced. Now the research has been done in how to create these models, it should be quicker to build new ones with more ethically sourced training data.