Somehow makes me think of the times before modern food safety regulations, when adulterations with substances such as formaldehyde or arsenic were common, apparently: https://pmc.ncbi.nlm.nih.gov/articles/PMC7323515/ We may be in a similar age regarding information now. Of course, this has always been a problem with the internet, but I would argue that AI (and the way oligopolistic companies are shoving it into everything) is making it infinitely worse.
TechTakes
Big brain tech dude got yet another clueless take over at HackerNews etc? Here's the place to vent. Orange site, VC foolishness, all welcome.
This is not debate club. Unless it’s amusing debate.
For actually-good tech, you want our NotAwfulTech community
I've already predicted that scraper activity would crash due to AI in my most recent MoreWrite essay, and seeing this only makes me confident in that assessment.
On a wider front, I suspect that web search in general's gonna dive in popularity - even if scraper activity remains the same during the AI winter, the bubble (with plenty of help from Google) has completely broken the web ecosystem which allowed search engines (near-exclusively Google) to thrive, through triggering a slop-nami that drowned human-made art, supercharging SEO's ability to bury quality output, and enabling AI Summary™ services which steal traffic through stealing work.
I guess the question here really boils down to: Can (less-than-perfect) capitalism solve this problem somehow (by allowing better solutions to prevail), or is it bound to fail due to the now-insurmountable market power of existing players?
I think the hardest part is the sheer volume of content to sift through and index on today's Internet. It is a completely different, and MUCH larger, beast than when Google was made in a garage in 1998 and quickly took the world by storm.
I have confidence that an open source/crowd sourced effort could beat Googles results, but the computational power and backend are I think the biggest gating factor. It would also need to be completely distributed with a lot of duplicate data for fault tolerance, but also a way to have a "source of truth" so that malicious users couldn't rewrite/poison data.