this post was submitted on 11 Sep 2025
126 points (100.0% liked)

Fuck AI

4024 readers
474 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] zod000@lemmy.dbzer0.com 11 points 3 days ago (2 children)

Most AI crawlers don't respect robots.txt files, but this info might be useful for other forms of blocking.

[–] Vittelius@feddit.org 14 points 2 days ago (1 children)

The repo, despite its name, doesn't only contain a robots.txt. It also has files for popular reverse proxies to block crawlers outright.

[–] zod000@lemmy.dbzer0.com 4 points 2 days ago

That was kind of the point of my comment since the name didn't indicate that. Also many tools that companies would use won't/can't use these files, but could still make use of the info. As I am specifically in that case, I wanted people to know that it could still be worth their time taking a look.

[–] Ulrich@feddit.org 2 points 2 days ago (1 children)

robots.txt doesn't do any sort of blocking. It's nothing more than a request. This is active blocking.

Although I'm not sure how successful it will be, given the determination of these bots.

[–] zod000@lemmy.dbzer0.com 1 points 2 days ago (1 children)

A few of them are quite good at randomizing their user-agent and using a large number of IP blocks. I've not had a fun time trying to limit them.

[–] Ulrich@feddit.org 3 points 2 days ago

Yeah dude, they're extremely malicious and not even trying to hide it anymore. They don't give a fuck that they're DDOSing the entire internet.