ArchiveTeam

54 readers
1 users here now

A community to discuss Archive Team and their efforts to archive the internet. Learn more about them on https://wiki.archiveteam.org/

Rules:

Related communities:

founded 1 week ago
MODERATORS
1
 
 

From wikipedia:

It is the largest and oldest of the U.S. international broadcasters, producing digital, TV, and radio content in 48 languages for affiliate stations around the world

From the AT wiki:

Under the second Trump administration, almost all of VOA's 1,300 journalists, producers and assistants were placed on administrative leave. There is risk that Voice of America will be shut down.

Archival has been ongoing for two weeks now, capturing millions of articles, reaching over 200 terabytes of data, including countless videos and images.

The grab is going a bit slow, as AT has a rate limit on how many items it lets warriors grab at once. As such, running more warriors with this project won't make a difference in archival speed right now.

14M items (articles, assets) have been archived, 16M are waiting to be processed, with 10M items so far that failed to be archived.

2
1
Roblox Assets Archival [New Project] (tracker.archiveteam.org)
submitted 1 week ago* (last edited 1 week ago) by archivist@lemm.ee to c/archiveteam@lemm.ee
 
 

I don't think there's info about this one on the wiki yet: https://wiki.archiveteam.org/index.php/Roblox

Looks like it will be done pretty quickly, as it was set to be the default project for warriors.

3
1
submitted 1 week ago* (last edited 1 week ago) by archivist@lemm.ee to c/archiveteam@lemm.ee
 
 

The archival started not long before the site was to be shut down, so there wasn't time to grab everything.

When the owners finally pulled the plug, blog posts started returning a 403 error, then later 410 errors. Images and javascript files remained downloadable for longer, but the JS files started returning 410 after a while as well. Images were still available for quite a bit longer.

Today, only so-called "tag" items were being archived, possibly because we ran out of known images, or the team sniffed out that those were still available and valuable.

The last item my warrior grabbed was a tag item at 2025-04-02T10:39:21.085891703Z

8M-14M known items are left unarchived, presumably many more millions not yet discovered.

4
1
SS Blog [New Archival Project] (tracker.archiveteam.org)
submitted 1 week ago* (last edited 1 week ago) by archivist@lemm.ee to c/archiveteam@lemm.ee
 
 

Archive Team has just begun the distributed archiving of the Japanese SS Blog, a blog hosting service, which is set to be discontinued on March 31, 2025.

And you can help! There isn't much time left, so as many people running the warrior as possible is needed.

Resources:

  • The wiki page of the project (not much info)
  • The tracker (at the top of the page) has the simplest info on how you can help out
  • The github page offers a docker-based alternative for advanced users, and more info on best practices for this sort of archiving

Why help out?

The web is disappearing all the time, and often a lot of previously easily accessible information is lost to time. These japanese blogs may not be very important to you, but they certainly are to a lot of people, and nobody knows what sort of information is found only here, until they need it.