this post was submitted on 20 Mar 2025
381 points (99.7% liked)
Open Source
34914 readers
508 users here now
All about open source! Feel free to ask questions, and share news, and interesting stuff!
Useful Links
- Open Source Initiative
- Free Software Foundation
- Electronic Frontier Foundation
- Software Freedom Conservancy
- It's FOSS
- Android FOSS Apps Megathread
Rules
- Posts must be relevant to the open source ideology
- No NSFW content
- No hate speech, bigotry, etc
Related Communities
- !libre_culture@lemmy.ml
- !libre_software@lemmy.ml
- !libre_hardware@lemmy.ml
- !linux@lemmy.ml
- !technology@lemmy.ml
Community icon from opensource.org, but we are not affiliated with them.
founded 5 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
LLM scraping is a parasite on the internet. In the actual ecological definition of parasite: they place a burden on other unwitting ~~organisms~~ computer systems, making it harder for the host to survive or carry out their own necessary processes, solely for the parasite's own benefit while giving nothing to the host in return.
I know there's an ongoing debate (both in the courts and on social media) about whether AI should have to pay royalties to its training data under copyright law, but I think they should at the very least be paying to use infrastructure while collecting the data, even free data, given that it costs the organisation hosting said data real money and resources to be scraped, and it's orders of magnitude more money and resources compared to serving that data to individual people.
The case can certainly be made that copying is not theft, but copying is by no means free either, especially when done at the scales LLMs do.