1598
Make illegally trained LLMs public domain as punishment
(www.theregister.com)
This is a most excellent place for technology news and articles.
You're thinking of licensing as a person putting something online WITH a license.
The terminology in this case is whether or not it was LICENSED by the commercial entity using and selling it's derivative. That is the default. The burden is on the commercial entity to prove they were the original creator of said content. It is by default plagiarism otherwise, and this is also the default.
Here's an example: I write a story and post it online, and it is specific to a toothbrush and toilet scrubber falling in love, and then having dish scrubber pads as children. I say the two main characters are called Dennis and Fran, and their children are called Denise and Francesca. Then somebody goes to prompt OpenAI for a similar and it kicks out the exact same story with the same names, I would win that case based on it clearly being beyond a doubt plagiarism.
Unless you as OpenAI can prove these are all completely random-which they aren't because it's trained on my data-then I would be deemed the original creator of that story, and any sales of that data I would be entitled to.
Proving that is a different thing, but that's what the laws say should happen. If they didn't contact me to license that story, it's still plagiarism. Same with music, movies...etc.