359
Meta Admits Use of 'Pirated' Book Dataset to Train AI
(torrentfreak.com)
1. Posts must be related to the discussion of digital piracy
2. Don't request invites, trade, sell, or self-promote
3. Don't request or link to specific pirated titles, including DMs
4. Don't submit low-quality posts, be entitled, or harass others
📜 c/Piracy Wiki (Community Edition):
💰 Please help cover server costs.
Ko-fi | Liberapay |
ok, fair; but do consider the context that the models are open weight. You can download them and use them for free.
There is a slight catch though which I’m very annoyed at: it’s not actually Apache. It’s this weird license where you can use the model commercially up until you have 700M Monthly users, which then you have to request a custom license from meta. ok, I kinda understand them not wanting companies like bytedance or google using their models just like that, but Mistral has their models on Apache-2.0 open weight so the context should definitely be reconsidered, especially for llama3.
It’s kind of a thing right now- publishers don’t want models trained on their books, „because it breaks copyright“ even though the model doesn’t actually remember copyrighted passages from the book. Many arguments hinge on the publishers being mad that you can prompt the model to repeat a copyrighted passage, which it can do. IMO this is a bullshit reason
anyway, will be an interesting two years as (hopefully) copyright will get turned inside out :)
I really have to thank you for an educated response