this post was submitted on 19 Aug 2025
863 points (99.3% liked)

Technology

74438 readers
3932 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
(page 2) 50 comments
sorted by: hot top controversial new old
[–] GissaMittJobb@lemmy.ml 22 points 6 days ago (1 children)

Skill issue. Cope and seethe

load more comments (1 replies)
[–] kescusay@lemmy.world 22 points 6 days ago

I set up a WAF for my company's publicly facing developer portal to block out bot traffic from assholes like these guys. It reduced bot traffic to the site by something like - I kid you not - 99.999%.

Fucking data vultures.

[–] Wispy2891@lemmy.world 10 points 5 days ago* (last edited 5 days ago)

Here comes the ridiculous offer to buy Google chrome with money they don't have: easy delicious scraping directly from the user source

[–] Ekybio@lemmy.world 20 points 6 days ago (10 children)

Can someone with more knowledge shine a bit more light on this while situation? Im out of the loop on the technical details

[–] BetaDoggo_@lemmy.world 22 points 6 days ago* (last edited 6 days ago) (11 children)

Perplexity (an "AI search engine" company with 500 million in funding) can't bypass cloudflare's anti-bot checks. For each search Perplexity scrapes the top results and summarizes them for the user. Cloudflare intentionally blocks perplexity's scrapers because they ignore robots.txt and mimic real users to get around cloudflare's blocking features. Perplexity argues that their scraping is acceptable because it's user initiated.

Personally I think cloudflare is in the right here. The scraped sites get 0 revenue from Perplexity searches (unless the user decides to go through the sources section and click the links) and Perplexity's scraping is unnecessarily traffic intensive since they don't cache the scraped data.

load more comments (11 replies)
load more comments (9 replies)
[–] BaroqueInMind@piefed.social 19 points 6 days ago

Cry more, Perplexity.

[–] Ermiar@lemmy.world 19 points 6 days ago* (last edited 6 days ago) (1 children)
[–] tarknassus@lemmy.world 2 points 4 days ago

I don't see a problem here. Maybe Perplexity should consider the reasons WHY Cloudflare have a firewall...?

[–] LodeMike@lemmy.today 12 points 6 days ago

Words cannot describe how much I hate this person

[–] Jimmycrackcrack@lemmy.ml 5 points 5 days ago

Gee that's a real removed it ain't it perplexity?

[–] fossilesque@mander.xyz 11 points 6 days ago

I hate that these bots ruin my read it later app. :(

load more comments
view more: ‹ prev next ›