So it turns out that the turing test is surprisingly weak and useless, and what AI marketing hype can you actually believe?
It goes without saying that models are trained on human input, and by now we all know that LLMs degrade rather quickly when they are trained on AI-generated input, so that got me thinking: Wouldn't that make a clear measure/metric of "how human" or "how intelligent" a model is?
I would like to see models measured on how quickly they degrade when "poisoned" with their own output.
Yes, we would still need a secondary metric to measure/detect the collapse, but this sort of scale would be elastic enough to measure and compare the most brain-dead LLMs, humans (the unity point), and even theoretical models that actually could improve themselves (over-unity).
Even if unity would be impossible with our current approach to LLMs, it might also let us compare LLMs to whatever "the next" big AI thing is that comes down the pipe, and completely cut through the cheaty marketing hype of those LLMs that are specifically trained on the intelligence questions/exams by which they would be measured.