this post was submitted on 19 Aug 2025

863 points (99.3% liked)

Technology

74438 readers

3932 users here now

This is a most excellent place for technology news and articles.

Our Rules

Follow the lemmy.world rules.
Only tech related news or articles.
Be excellent to each other!
Mod approved content bots can post up to 10 articles per day.
Threads asking for personal tech support may be deleted.
Politics threads may be removed.
No memes allowed as posts, OK to post as comments.
Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
Check for duplicates before posting, duplicates may be removed
Accounts 7 days and younger will have their posts automatically removed.

Approved Bots

founded 2 years ago

MODERATORS

L3s@lemmy.world

enu@lemmy.world

technopagan@lemmy.world

L4s@lemmy.world

L3s@hackingne.ws

L4s@hackingne.ws

863

The AI company Perplexity is complaining their bots can't bypass Cloudflare's firewall (www.searchenginejournal.com)

submitted 6 days ago* (last edited 6 days ago) by Davriellelouna@lemmy.world to c/technology@lemmy.world

241 comments fedilink hide all child comments

(page 2) 50 comments

sorted by: hot top controversial new old

[–] GissaMittJobb@lemmy.ml 22 points 6 days ago (1 children)

Skill issue. Cope and seethe

load more comments (1 replies)

[–] kescusay@lemmy.world 22 points 6 days ago

I set up a WAF for my company's publicly facing developer portal to block out bot traffic from assholes like these guys. It reduced bot traffic to the site by something like - I kid you not - 99.999%.

Fucking data vultures.

[–] Wispy2891@lemmy.world 10 points 5 days ago* (last edited 5 days ago)

Here comes the ridiculous offer to buy Google chrome with money they don't have: easy delicious scraping directly from the user source

[–] Ekybio@lemmy.world 20 points 6 days ago (10 children)

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details

[–] BetaDoggo_@lemmy.world 22 points 6 days ago* (last edited 6 days ago) (11 children)

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

load more comments (11 replies)

load more comments (9 replies)

[–] BaroqueInMind@piefed.social 19 points 6 days ago

Cry more, Perplexity.

[–] Ermiar@lemmy.world 19 points 6 days ago* (last edited 6 days ago) (1 children)

Oh no ! Anyway…

[–] prex@aussie.zone 15 points 6 days ago

boo fucking hoo

[–] tarknassus@lemmy.world 2 points 4 days ago

I don't see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall...?

[–] LodeMike@lemmy.today 12 points 6 days ago

Words cannot describe how much I hate this person

[–] Jimmycrackcrack@lemmy.ml 5 points 5 days ago

Gee that's a real removed it ain't it perplexity?

[–] fossilesque@mander.xyz 11 points 6 days ago

I hate that these bots ruin my read it later app. :(

load more comments