this post was submitted on 20 Mar 2025
503 points (99.6% liked)
Technology
67050 readers
6380 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
I'm not sure how they actually implemented it, but you can easily block ML crawlers via cloud flare. Isn't just about every small site/service behind CF anyway?
Last I checked, cloudflare requires the user to have JavaScript and cookies enabled. My institution doesn't want to require those because it would likely impact legitimate users as well as bots.
Huh? I can reach my site via curl that has neither. How did you come up with this random set of requirements?
Odd. I just tried
and got
I'm clearly not on the same setup as you are, but my off-the-cuff guess is that your curl command was issued from a system that cloudflare already recognized (IP whitelist, cookies, I dunno).
Anyways, I'm reading through this blog post on using cURL with cloudflare-protected sites and I'm finding it interesting.
Of course their challenge requires those things. How else could they implement it? Most users will never be presented with a challenge though and it is trivial to disable if you don't want to ever challenge anyone. I was just saying CF blocks ML crawlers.