this post was submitted on 18 Aug 2025

1128 points (99.0% liked)

Technology

74247 readers

4816 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

1128

Codeberg: army of AI crawlers are extremely slowing us; AI crawlers learned how to solve the Anubis challenges. (i.imgur.com)

submitted 2 days ago by Pro@programming.dev to c/technology@lemmy.world

222 comments fedilink hide all child comments

cross-posted from: https://programming.dev/post/35852706

Source.

(page 2) 50 comments

sorted by: hot top controversial new old

[–] zbyte64@awful.systems 29 points 1 day ago (6 children)

Is there nightshade but for text and code? Maybe my source headers should include a bunch of special characters that then give a prompt injection. And sprinkle some nonsensical code comments before the real code comment.

[–] qaz@lemmy.world 2 points 1 day ago

There are glitch tokens but I think those only effect it when using it.

load more comments (5 replies)

[–] zifk@sh.itjust.works 98 points 2 days ago (8 children)

Anubis isn't supposed to be hard to avoid, but expensive to avoid. Not really surprised that a big company might be willing to throw a bunch of cash at it.

load more comments (8 replies)

[–] PhilipTheBucket@piefed.social 94 points 2 days ago (12 children)

I feel like at some point it needs to be active response. Phase 1 is a teergrube type of slowness to muck up the crawlers, with warnings in the headers and response body, and then phase 2 is a DDOS in response or maybe just a drone strike and cut out the middleman. Once you've actively evading Anubis, fuckin' game on.

[–] turbowafflz@lemmy.world 109 points 2 days ago (8 children)

I think the best thing to do is to not block them when they're detected but poison them instead. Feed them tons of text generated by tiny old language models, it's harder to detect and also messes up their training and makes the models less reliable. Of course you would want to do that on a separate server so it doesn't slow down real users, but you probably don't need much power since the scrapers probably don't really care about the speed

[–] xthexder@l.sw0.com 61 points 2 days ago

I love catching bots in tarpits, it's actually quite fun

[–] 31ank@ani.social 45 points 2 days ago* (last edited 2 days ago)

Some guy also used zip bombs against AI crawlers, don't know if it still works. Link to the lemmy post

load more comments (6 replies)

load more comments (11 replies)

[–] londos@lemmy.world 44 points 2 days ago (4 children)

Can there be a challenge that actually does some maliciously useful compute? Like make their crawlers mine bitcoin or something.

[–] raspberriesareyummy@lemmy.world 67 points 2 days ago (15 children)

Did you just say use the words "useful" and "bitcoin" in the same sentence? o_O

[–] kameecoding@lemmy.world 32 points 2 days ago (5 children)

Bro couldn't even bring himself to mention protein folding because that's too socialist I guess.

[–] andallthat@lemmy.world 15 points 1 day ago* (last edited 1 day ago) (3 children)

LLMs can't do protein folding. A specifically-trained Machine Learning model called AlphaFold did. Here's the paper.

Developing, training and fine tuning that model was a research effort led by two guys who got a Nobel for it. Alphafold can't do conversation or give you hummus recipes, it knows shit about the structure of human language but can identify patterns in the domain where it has been specifically and painstakingly trained.

It wasn't "hey chatGPT, show me how to fold a protein" is all I'm saying and the "superhuman reasoning capabilities" of current LLMs are still falling ridiculously short of much simpler problems.

[–] patatahooligan@lemmy.world 7 points 1 day ago

The crawlers for LLM are not themselves LLMs.

[–] mobotsar@sh.itjust.works 1 points 1 day ago

Crawlers aren't LLMs; they can do arbitrary computations (whatever the target demands to access resources).

load more comments (1 replies)

load more comments (4 replies)

load more comments (14 replies)

[–] nymnympseudonym@lemmy.world 2 points 1 day ago

The Monero community spent a long time trying to find a "useful PoW" function. The problem is that most computations that are useful are not also easy to verify as correct. javascript optimization was one direction that got pursued pretty far.

But at the end of the day, a crypto that actually intends to withstand attacks from major governments requires a system that is decentralized, trustless, and verifiable, and the only solutions that have been found to date involve algorithms for which a GPU or even custom ASIC confers no significant advantage over a consumer-grade CPU.

[–] 0x0@lemmy.zip 1 points 1 day ago* (last edited 1 day ago)

Anubis does that (the computation part). You may've seen it already.

load more comments (1 replies)

[–] oeuf@slrpnk.net 41 points 2 days ago (2 children)

Crazy. DDoS attacks are illegal here in the UK.

load more comments (2 replies)

[–] StopSpazzing@lemmy.world 18 points 1 day ago* (last edited 23 hours ago) (2 children)

Is there a migration tool? If not would be awesome to migrate everything including issues and stuff. Bet even more people would move.

[–] BlameTheAntifa@lemmy.world 16 points 1 day ago

Codeberg has very good migration tools built in. You need to do one repo at a time, but it can move issues, releases, and everything.

load more comments (1 replies)

[–] UnderpantsWeevil@lemmy.world 48 points 2 days ago (4 children)

I mean, we really have to ask ourselves - as a civilization - whether human collaboration is more important than AI data harvesting.

load more comments (4 replies)

load more comments