this post was submitted on 07 Jul 2025
145 points (98.7% liked)

Technology

39582 readers
491 users here now

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community's icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

founded 3 years ago
MODERATORS
top 24 comments
sorted by: hot top controversial new old
[–] theangriestbird@beehaw.org 43 points 1 week ago (1 children)

This snip at the end is so good:

Iaso said she thinks AI companies follow her work, and that if they really want to stop her and Anubis they just need to distract her.

“If you are working at an AI company, here's how you can sabotage Anubis development as easily and quickly as possible,” she wrote on her site. “So first is quit your job, second is work for Square Enix, and third is make absolute banger stuff for Final Fantasy XIV. That’s how you can sabotage this the best.”

[–] Geodad@beehaw.org 9 points 1 week ago

I'd be fine with this... 🤣

[–] who@feddit.org 16 points 1 week ago* (last edited 1 week ago) (1 children)

She told me she’s [...] also thinking about a version that doesn’t require JavaScript, which some privacy-minded disable in their browsers.

As someone who is keenly aware of the privacy and security problems that come with allowing web scripts, I hope she prioritizes this soon. It's really disappointing to find sites that were formerly readable without javascript suddenly inaccessible since adopting Anubis. The more sites that do this, the more people are pushed toward enabling scripts by default, exposing them to a great many trackers and web exploits that would otherwise be blocked.

[–] exu@feditown.com 2 points 1 week ago (1 children)
[–] who@feddit.org 1 points 1 week ago (1 children)

Interesting. Judging by that option's name, it seems to refer to use of the HTML <meta> tag to refresh a page.

https://developer.mozilla.org/en-US/docs/Web/HTML/Reference/Elements/meta/http-equiv

Neither this tag nor using it for refresh is new at all. I don't think I've seen it used to detect bots, though. I wonder what Anubis is doing here.

[–] JohnEdwa@sopuli.xyz 2 points 1 week ago

It's simply checking if the connection is from an actual browser, as a scraper pretending to be one won't actually refresh the page as instructed. It's going to buy some time, but like the rest of Anubis in general, it will only work until the scrapers get modified to work around it.

[–] FundMECFSResearch@lemmy.blahaj.zone 13 points 1 week ago (3 children)

This thing Anubis always flags me for some reason. I use mullvad and safari (ios) with some add and tracker blocking extensions.

[–] Photuris@lemmy.ml 6 points 1 week ago (2 children)

More sites in general are blocking mullvad traffic lately (in my experience), and I’m not sure what, if anything, can be done about it.

[–] FundMECFSResearch@lemmy.blahaj.zone 6 points 1 week ago (1 children)

I expect better from a popular FOSS tool being used by privacy aware people though.

[–] SweetCitrusBuzz@beehaw.org 2 points 1 week ago

Can you open an issue, or see if one is open already for this?

[–] Powderhorn@beehaw.org 3 points 1 week ago

Agreed. Luckily, they don't seem to have the full list of Mullvad IPs, so if I really want to read something, I just try another tunnel.

[–] simple@piefed.social 6 points 1 week ago (1 children)

Do you have javascript or cookies disabled? That might stop you from getting past.

[–] Appoxo@lemmy.dbzer0.com 4 points 1 week ago

I wonder why traffic from known VPN companies are under more scrutiny than traffic from domestic households................

[–] leaky_shower_thought@feddit.nl 11 points 1 week ago (1 children)

i like this one better than cloudflare's turnstile.

cf blocks me all the time for the smallest reasons and i can't seem to find their nag email.

[–] fuckwit_mcbumcrumble@lemmy.dbzer0.com 2 points 1 week ago (1 children)

I have no issues with Cloudflare, but Anubis always takes it sweet ass time to verify me. Like 30+ seconds just sitting there, but then eventually I get in.

[–] Vanilla_PuddinFudge@infosec.pub 1 points 1 week ago* (last edited 1 week ago)

Windows XP ended support like 20 years ago if you were wondering if the Pentium 4 build you're using was still viable.

[–] remington@beehaw.org 2 points 1 week ago (2 children)

Would you edit your post and add the following archive link to the body, please?

https://archive.is/VcoE1

[–] who@feddit.org 7 points 1 week ago* (last edited 1 week ago) (1 children)

Unfortunately, archive.is seems to have moved behind a big corporate CAPTCHA service, subjecting readers to having their reading habits (both the articles and the referring communities) tracked at a large scale.

I suggest this archive link instead:

https://web.archive.org/web/20250707135819/https://www.404media.co/the-open-source-software-saving-the-internet-from-ai-bot-scrapers/

[–] remington@beehaw.org 1 points 1 week ago (1 children)

Unfortunately, archive.is has moved behind Cloudflare, subjecting readers to having their reading habits (both the articles and the referring communities) tracked at a large scale.

How do you know this?

What about https://ghostarchive.org/?

[–] who@feddit.org 6 points 1 week ago* (last edited 1 week ago) (1 children)

Sorry; I shouldn't have written Cloudflare specifically. Their CAPTCHA page now contains scripts from Google, not Cloudflare. I have corrected my comment.

How do you know this?

Because a couple months ago, archive.is/archive.today started showing me CAPTCHA pages instead of the archived articles when I use Firefox with scripts disabled. The current page contains scripts hosted by Google, which I won't enable, so I can't read the archived articles.

What about https://ghostarchive.org/?

I haven't used that site enough to have a consistent picture of what it's doing. When I tried it a few minutes ago, it directed me to a CAPTCHA wall when trying to submit an article, but not when searching for an archived article. I'll try to remember to look at it again periodically, to be able to answer this question in the future.

[–] remington@beehaw.org 3 points 1 week ago

Thanks. I appreciate the info and effort.

[–] sabreW4K3@lazysoci.al 5 points 1 week ago (1 children)

To be honest with you, I refuse on moral grounds. 404 are independent and do good work. You've already linked a pay wall bypass in the comments, if anyone would like to find it, it's not hard to scroll.

[–] remington@beehaw.org 4 points 1 week ago

OK. Fair enough.