this post was submitted on 11 Sep 2025
125 points (100.0% liked)

Fuck AI

4024 readers
380 users here now

"We did it, Patrick! We made a technological breakthrough!"

A place for all those who loathe AI to discuss things, post articles, and ridicule the AI hype. Proud supporter of working people. And proud booer of SXSW 2024.

founded 2 years ago
MODERATORS
top 11 comments
sorted by: hot top controversial new old
[–] alecsargent@lemmy.zip 3 points 1 day ago* (last edited 22 hours ago)

If you are using Hugo use this robots.txt template that automatically updates every build:

{{- $url := "https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/refs/heads/main/robots.txt" -}}
{{- $resource := resources.GetRemote $url -}}
{{- with try $resource -}}
  {{ with .Err }}
    {{ errorf "%s" . }}
  {{ else with .Value }}
	{{- .Content -}}
  {{ else }}
    {{ errorf "Unable to get remote resource %q" $url }}
  {{ end }}
{{ end -}}

Sitemap: {{ "sitemap.xml" | absURL }}

Optionally lead rouge bots to poisoned pages:

{{- $url := "https://raw.githubusercontent.com/ai-robots-txt/ai.robots.txt/refs/heads/main/robots.txt" -}}
{{- $resource := resources.GetRemote $url -}}
{{- with try $resource -}}
  {{ with .Err }}
    {{ errorf "%s" . }}
  {{ else with .Value }}
    {{- printf "%s\n%s\n\n" "User-Agent: *" "Disallow: /train-me" }}
    {{- .Content -}}
  {{ else }}
    {{ errorf "Unable to get remote resource %q" $url }}
  {{ end }}
{{ end -}}

Sitemap: {{ "sitemap.xml" | absURL }}

~~Check out how to poison your pages for rouge bots in this article~~

Repo was deleted and the internet archive was excluded.

I use Quixotic and a Python script to poison the pages and I included those in my site update script.

Its all cobbled together in amateur fashion from the deleted article but its honest work.

[–] db0@lemmy.dbzer0.com 6 points 2 days ago (1 children)

That's pretty sweet but just be aware a lot of bots are bad actors and don't advertise a proper user agent, so you have to also block by ips. Blocking all alibaba server ips is a good start.

[–] plz1@lemmy.world 2 points 2 days ago

This is an nginx reverse proxy configuration. It's not passive, like robots.txt, but they probably named it like thatin solidarity with the intent of robots.txt. You're on-point about Alibaba though, which I'm sure could be somewhat easily added to this nginx blocking strategy. Anubis is still probably a better solution, since it doesn't have that limitation of having LLM bots pass a user-agent.

[–] zod000@lemmy.dbzer0.com 11 points 2 days ago (2 children)

Most AI crawlers don't respect robots.txt files, but this info might be useful for other forms of blocking.

[–] Vittelius@feddit.org 14 points 2 days ago (1 children)

The repo, despite its name, doesn't only contain a robots.txt. It also has files for popular reverse proxies to block crawlers outright.

[–] zod000@lemmy.dbzer0.com 4 points 2 days ago

That was kind of the point of my comment since the name didn't indicate that. Also many tools that companies would use won't/can't use these files, but could still make use of the info. As I am specifically in that case, I wanted people to know that it could still be worth their time taking a look.

[–] Ulrich@feddit.org 2 points 2 days ago (1 children)

robots.txt doesn't do any sort of blocking. It's nothing more than a request. This is active blocking.

Although I'm not sure how successful it will be, given the determination of these bots.

[–] zod000@lemmy.dbzer0.com 1 points 2 days ago (1 children)

A few of them are quite good at randomizing their user-agent and using a large number of IP blocks. I've not had a fun time trying to limit them.

[–] Ulrich@feddit.org 3 points 2 days ago

Yeah dude, they're extremely malicious and not even trying to hide it anymore. They don't give a fuck that they're DDOSing the entire internet.

[–] hendrik@palaver.p3x.de 5 points 2 days ago* (last edited 2 days ago)

Many AI crawlers don't identify properly, they fake the User Agent and pretend to be Google Chrome or something like that. So it's bound to only deal with the ones that somehow behave, while it won't do anything to the real bad ones. And from my experience, those can do enough requests per second to get an average server into trouble. At least that's what happened to mine.

[–] oplkill@lemmy.world 4 points 2 days ago

If only they could read