After 18 years, Internet archive site amasses 410 billion web pages
How much data would have been posted on the World Wide Web shortly after it gained mass appeal in 1996? More than 410 billion Web pages, according to a site that archives the Web.
And that doesn't even include sites that have requested exclusion from indexing, according to the Internet Archive Wayback Machine, which breached the 400-billion mark on May 9.
"The Wayback Machine, a digital archive of the World Wide Web, has reached a landmark with 400 billion webpages indexed. This makes it possible to surf the web as it looked anytime from late 1996 up until a few hours ago," the Internet Archive said in a blog post.
On the other hand, the Wayback Machine's home page as of Sunday showed it had amassed 410 billion pages.
"Every day, three million people use our collections," said founder and digital librarian Brewster Kahle as he asked the public for donations to keep the site running.
The Wayback Machine said some of the sites it had indexed since late 1996 include Yahoo! (November 1996), Geocities (December 12, 1998), and Diaryland.com
The Wayback Machine was launched in 2001, while its Archive was launched in 2006 to let libraries subscribing to it "create curated collections of valuable web content."
On March 25, 2009, the Internet Archive and Sun Microsystems launch a new datacenter that stores the whole web archive and serves the Wayback Machine.
It handles three petabytes of data and can process 500 requests per second from its home in a shipping container.
Meanwhile, on June 15, 2011, the HTTP Archive becomes part of the Internet Archive, adding data about the performance of websites to our collection of web site content.
But it was not welcome in all countries - it was blocked in China without notice, but became available there again on May 28, 2012.
On October 26, 2012, the Internet Archive makes 80 terabytes of archived web crawl data from 2011 available for researchers, "to explore how others might be able to interact with or learn from this content."
In October 2013, the Wayback Machine gained new features such as the ability to see newly-crawled content an hour after it gets the data.
Other new features include a “Save Page” for anyone to archive a page on demand, and an effort to fix broken links on the web starting with WordPress.com and Wikipedia.org.
It also provided access to important federal government sites that went dark during the federal government shutdown.
A separate article on The Next Web cited the possibility of the Wayback Machine getting 500 billion pages by 2015.
"Onwards and upwards! Will The Way Back Machine have 500 billion webpages indexed by 2015? We wouldn’t be surprised if it happened sooner," it said. — Joel Locsin/JDS, GMA News