LLM scraping is a parasite on the internet. In the actual ecological definition of parasite: they place a burden on other unwitting ~~organisms~~ computer systems, making it harder for the host to survive or carry out their own necessary processes, solely for the parasite's own benefit while giving nothing to the host in return.

I know there's an ongoing debate (both in the courts and on social media) about whether AI should have to pay royalties to its training data under copyright law, but I think they should at the very least be paying to use infrastructure while collecting the data, even free data, given that it costs the organisation hosting said data real money and resources to be scraped, and it's orders of magnitude more money and resources compared to serving that data to individual people.

The case can certainly be made that copying is not theft, but copying is by no means free either, especially when done at the scales LLMs do.