Wayback Machine

From Fanlore
Jump to navigation Jump to search
Website
Name: The Internet Archive Wayback Machine
Owner/Maintainer:
Dates: 1996-present
Type: storage
Fandom: The Internet
URL: http://www.archive.org (stable), http://web.archive.org/ (beta)
Click here for related articles on Fanlore.

The Wayback Machine is a service created by the Internet Archive that stores and provides access to old versions of websites.[1] It aims to preserve as much of the World Wide Web as possible.

While the Wayback Machine isn't a fannish site per se, it's often the last resort for finding fanfic and fan sites that have vanished from the internet, whether because of unrecoverable crashes, personal choice on a fan's part, abandoned sites, or a host company going belly-up or otherwise deciding to shut down sites -- such as AOL closing down AOL Hometown, and Yahoo/Geocities deciding to stop hosting personal sites.

The Wayback Machine receives its data from Alexa Internet, which repeatedly spiders the web to cache copies of sites. The Wayback Machine can provide not just a single copy of a given site, but often a years-long history.[1] It operates under the Oakland Archive policy that allows site indexing with an 'opt-out' policy (either via a "no index' file stored on the website or via a written request after indexing from the site owner). Other countries, such as the UK, do not allow site indexing without express written consent (an "opt-in" policy) and as a result some estimate that only 1% of digital history and creations will be preserved.[2]

As of April 20, 2009, the Wayback Machine stored approximately 4.5 Petabytes (4.5 million gigabytes) of archived web data, including huge swathes of online fandom.[3]

The site is named after the "Waybac Machine" from The Rocky and Bullwinkle Show cartoons of the 1960s (which in turn was a play on the Univac computer).

Starting in late 2010, the site began beta testing an updated version of the WayBack Machine. Among one of the new features is the ability to navigate to a website while within the WayBack machine and, if the site is still online create a single page snapshot of the page that will eventually be stored in the archives. It offers a user driven method of archiving that is not dependent on random site crawling. However, since only a single page is archived, not the entire website, the WayBack Machine will continue to rely on Alexa for the bulk of its raw data.

Limitations

Although the Wayback Machine is invaluable, it is not infallible. Anyone wanting to use it should note the following:

  • Internet Archive policies respect "no robots" instructions on robots.txt documents. If a site owner includes such a command, the Internet Archive will either not record the site or not let you see it.
  • The Internet Archive was founded in 1996.
    • Sites that disappeared prior to 1996 will obviously not be included.
    • Since their bots track sites through links, and originally worked outward from links known to the founders, many sites that disappeared shortly after the founding of the Internet Archive were also never included.
  • The bots cannot record a file unless they find it.
    • Sites that have never been linked to will never be recorded (at least, not unless the owner specifically puts in a request, which is now possible); and sites rarely linked to may also never be recorded.
    • Pages/graphics may not have been recorded if there was a typo on a link.
    • In most cases, the bots record from the main directory through the links off the index page. If the site owner has left a dummy index page in situ and linked off a table of contents page with another name (e.g. "home.html"), then the internal pages may not be recorded.
    • Subsections of a site that were not linked into the index page may not have been recorded (though they may be, if they were tracked through an outside link from another website).
  • Because of size limitations, the bots have always not recorded large files. What constitutes a "large" file has changed with time: for sites recorded in the early years of the Archive, size constraints were stricter than they are today. However, you will often find that the following have not been recorded:
  • In some cases, the bots do a shallow scan of the net that picks up the pages and graphics in the main directory, but not in subdirectories (or not in sub-subdirectories). As a result, files that have been organized into folders such as "graphics/" or "fanfic/" are not always recorded.

References

  1. ^ a b Internet Archive Frequently Asked Questions (accessed 9 June 2010)
  2. ^ For more about the looming 'digital black hole' see 'A Real Copyright Problem In The UK: The Difficulty Of Archiving Important Websites' on techdirt.com, dated March 8, 2010 and 'Archiving Britain's web: The legal nightmare explored' on wired.co.uk dated March 5, 2010.
  3. ^ Wayback Machine comes to life in new home, posted April 20, 2009. Accessed May 29, 2009.