NYT is currently suing because of copyright infringiments.
https://www.nytimes.com/2023/12/27/business/media/new-york-times-open-ai-microsoft-lawsuit.html
it’s unclear that copyright has anything to say about AI training anyway
Although lawmakers worldwide have slept while AI advanced and therefore missed to make some important laws, they are catching up. Europe recently passed its first AI act. As far as I've seen it also states that companies must disclose a detailed summary of their training data.
It is necessary to employ a method which enables the training procedure to distinguish copyrighted material. In the "dumbest" case, some humans will have to label it.
Just because you've edited a comment, doesn't mean that this can be seen as "oh, this is under copyright now".
I don't say it's technical impossible. To the contrary, it very much is possible. It's just more work. This drives the development costs up and can give some form of satisfaction to angered ex-reddit users like me. However, those costs will be peanuts for giants like Google / Alphabet.