1602
Make illegally trained LLMs public domain as punishment
(www.theregister.com)
This is a most excellent place for technology news and articles.
The LLM does reproduce copyrighted data though.
Not 1:1, overfitted images still have considerable differences to their original. If you chose "reproduce" to make that point, that's why OP clarified it wasn't literally copying training data, as the actual data being in the model would be a different story. Because these models are (in simplified form) a bunch of really complex math that produces material, it's a mathematical inevitability that it produces copyrighted material, even for calculations that weren't created due to overfitting. Just like infinite monkeys on infinite typewriters will eventually reproduce every piece of copyrighted text.
But then I would point you to the camera on your phone. If you take a copyrighted picture with that, you're still infringing. But was the camera created with the intention to appropriate material captured by the lens? Which is why we don't blame the camera for that, we blame the person that used it for that purpose. AI users have an ethical obligation not to steer the AI towards generating infringing material.
And the easiest way to do that is to not include infringing material in the first place.
How?
*it can produce data identical to data that has been copyrighted before