The Problem of Digital Impermanence
The internet has a memory problem. Studies have found that a significant portion of web pages linked in academic papers become inaccessible within years of publication — a phenomenon called "link rot." News articles vanish when publications shut down. Government documents are quietly removed. Social media posts are deleted. Entire platforms — GeoCities, MySpace, Vine, Google+ — are switched off, taking years of human expression with them.
Digital information is, paradoxically, more fragile than many physical media. A newspaper from 1950 might survive in a library archive. A website from 1998 is probably gone forever unless someone specifically saved it.
Brewster Kahle and the Internet Archive
In 1996, internet entrepreneur Brewster Kahle founded the Internet Archive in San Francisco with a simple, staggering ambition: universal access to all knowledge. Drawing on the model of the Library of Alexandria — and explicitly invoking it as both inspiration and cautionary tale — Kahle envisioned a digital library that would preserve not just books and text, but software, audio, video, and the web itself.
The organization is a registered 501(c)(3) nonprofit. It receives no government funding. It survives on donations and contracts, and it operates out of a former church building in San Francisco's Richmond district — an appropriate home for an organization with a quasi-religious dedication to preservation.
How the Wayback Machine Works
The Wayback Machine (web.archive.org), launched publicly in 2001, is the Internet Archive's flagship web preservation project. At its core, it works through web crawling:
- Crawlers — automated programs — continuously browse the web, following links and downloading copies of pages they find.
- Downloaded pages, along with their images, stylesheets, scripts, and other assets, are stored with a timestamp.
- Users can access any stored version of a URL by entering it into the Wayback Machine interface and selecting a date from the calendar view.
- Additionally, anyone can manually submit a URL for immediate archiving using the "Save Page Now" feature.
The scale is extraordinary. As of recent years, the Wayback Machine holds well over 800 billion web page captures spanning decades. The physical storage infrastructure required to hold this data is immense, and the Archive maintains redundant copies at multiple locations.
Beyond the Web: Software, Games, and More
The Internet Archive preserves far more than websites. Its collections include:
- Software Library: Thousands of vintage programs runnable in-browser via emulation, including DOS games, early Mac applications, and console ROMs.
- Open Library: A digital lending library of physical books, with controlled digital lending of scanned copies.
- Audio Archive: Live concert recordings, old-time radio broadcasts, and historical speeches.
- Video Archive: News broadcasts, films, and moving image collections.
- Texts: Millions of books, papers, and documents in the public domain.
The Legal Battleground
The Internet Archive has faced significant legal challenges. Publishers have sued over its digital lending programs. Its role in copyright law is contested and legally complex. The Archive argues that preservation is a legitimate use case that copyright law should accommodate; rights holders often disagree. These cases are working through courts and will shape the future of digital preservation for years to come.
How You Can Help Preserve Digital History
Digital preservation isn't only the Internet Archive's job. There are practical steps anyone can take:
- Use the Wayback Machine's "Save Page Now" feature to archive important pages before they disappear.
- Donate to the Internet Archive at archive.org.
- Contribute to volunteer preservation projects like the Archive Team.
- For important documents in your own life or organization, maintain offline backups in multiple formats.
- Learn about and advocate for link rot prevention practices in academic and journalistic publishing.
The digital past is not automatically preserved. It requires deliberate effort, funding, and public support. The Internet Archive is doing extraordinary work — but the web moves faster than any single institution can archive. Preservation is a community effort, and it starts with understanding why it matters.