Hacker News

4091 readers

1 users here now

This community serves to share top posts on Hacker News with the wider fediverse.

Rules

0. Keep it legal

Keep it civil and SFW
Keep it safe for members of marginalised groups

founded 2 years ago

MODERATORS

haxor@derp.foo

NY Times is asking that ALL LLMs trained on Times data be destroyed (twitter.com)

submitted 2 years ago by haxor@derp.foo to c/hackernews@derp.foo

10 comments fedilink hide all child comments

There is a discussion on Hacker News, but feel free to comment here as well.

you are viewing a single comment's thread
view the rest of the comments

[–] sonori@beehaw.org 2 points 2 years ago (1 children)

They’ve generally been pretty open about using pirated books and data to train their product on.

https://shkspr.mobi/blog/2023/07/fruit-of-the-poisonous-llama/

There is also no law stating that copyright doesn’t apply to training AI in the same way it applies to every other use. Even this comment technically has a copyright, in the same way that people who write long original stories on forums like Spacebattles and Sufficent Velocity post by post still have a copyright on that story.

There is a carve out in copyright for academic research, but that protection disappears the second you start using it for a commercial purpose.

[–] lvxferre@lemmy.ml 1 points 2 years ago* (last edited 2 years ago)

Now I get it. And yes, now I agree with you; it would give them a bit more merit to claim that the data being used in the input was obtained illegally. (Unless Meta has right of use to ThePile.)

The link does not mention GPT (OpenAI, Microsoft) or LaMDA/Bard (Google, Alphabet), but if Meta is doing it odds are that the others are doing it too.

Sadly this would be up to the copyright holders of this data. It does not apply to NYT content that you can freely access online, for NYT it got to be about the output, not the input.