this post was submitted on 11 Sep 2025
126 points (100.0% liked)
Fuck AI
4024 readers
474 users here now
"We did it, Patrick! We made a technological breakthrough!"
A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Most AI crawlers don't respect robots.txt files, but this info might be useful for other forms of blocking.
The repo, despite its name, doesn't only contain a robots.txt. It also has files for popular reverse proxies to block crawlers outright.
That was kind of the point of my comment since the name didn't indicate that. Also many tools that companies would use won't/can't use these files, but could still make use of the info. As I am specifically in that case, I wanted people to know that it could still be worth their time taking a look.
robots.txt doesn't do any sort of blocking. It's nothing more than a request. This is active blocking.
Although I'm not sure how successful it will be, given the determination of these bots.
A few of them are quite good at randomizing their user-agent and using a large number of IP blocks. I've not had a fun time trying to limit them.
Yeah dude, they're extremely malicious and not even trying to hide it anymore. They don't give a fuck that they're DDOSing the entire internet.