Posted: November 9 2010, 1:35pm in Random Stuff

As I was looking at the Google cache of a page, I noticed that the layout was a bit weird. The issue was that the cache date was Nov. 1 and the site had undergone updates on the 3rd or so. The updated external CSS files weren't playing well with the old cache page.

So I hit refresh and noticed that the cache date was now Nov. 4th, and the page looked fine, as the page from Nov. 4th was designed for the updated external CSS files. So I hit refresh a few more times and noticed I was able to randomly toggle between two different versions; The cached page from Nov. 1st and the page from Nov 4th. So, Google obviously stores cache in various different places. This gave me an idea.

It has been said before that Google has kept copies of all of the different indices it has ever created. If this means what I think it means, then they should have all of the different cached copies for every URL that Google has every crawled. OK, so maybe you know where I'm going with this, but keep reading anyway...

I'd like to introduce you to... Google "Versions" (or possibly Google "Timeport"). I'm going to use Google as the example search engine in this case. I'm sorry Bing - This could equally apply to you, but I spend most of my day worrying about Google.

Google Versions Logo

Imagine this:

  1. A user submits a search query.
  2. The standard SERP is produced.

BUT... each result has an extra link called "Versions" beside the standard "Cached" and "Similar" links.

Google Versions Search Engine Results Page (SERP)

When you click on this "Versions" link, you are presented with a list of the dates for which Google has a cached version of the page. You click on a date and get the cached version of the page on that date.

Google Versions Cache Dates

Yes, this is basically the concept of archive.org aka the "Wayback Machine", except that archive.org does a relatively bad job of crawling pages often enough for it to be useful. I say "relatively" because they obviously don't have the resources of a company such as Google or Microsoft. So perhaps archive.org does a fantastic job given their resources, but they're terrible when compared to either of the aforementioned companies.

There's a couple of problems with the idea of Google Versions. First, Google right now caches only the HTML. All of the inline/embedded elements in the cached code, such as CSS, JavaScript, Images, etc., are relative to the original URI of the page. So, if you were to view an old cache version of a page and some object that is referenced from within that page has since been removed from the server - or even just modified on the server - then the page will likely be broken to some degree or another.

To workaround this problem, Google would have to synchronously cache the page itself and all of the objects referenced therein. To my knowledge, the Google cache system simply does not work this way at this time and it would probably require a rewrite. We know that Google crawl images and also CSS and JavaScript files, but I don't know the extent to which any of these are cached, or whether they are cached synchronously with their parent pages.

But, archive.org has synchronously cached pages and their associated objects for years, so presumably Google could do it also if they were so inclined.

Another problem is the potential copyright issues, but I don't see this really being a hurdle - especially in the US. The robots.txt is the defacto standard for exclusion from search engines. A separate User Agent string could be used for Google Versions e.g. "VersionsBot". Also, the meta robots noarchive tag should also prevent a page from being indexed in the Versions archive.

It's an interesting idea that I would like to see Google or Bing introduce. It's certainly in line with Google's mission statement to "organize the world's information and make it universally accessible and useful."

What are your thoughts?

Reader Comments

seo reseller wrote:
July 26 2011, 9:59pm

How come google versions is not available in my SERPS? Is this only available for signed in google searches?
Darrin J. Ward wrote:
August 3 2011, 8:07am

Hi there,

Google Versions was just an idea I put out there. It doesn't exist in reality, although I really wish that it did.

Darrin
Local SEO wrote:
September 28 2011, 5:23am

This seems a very interesting topic in SEO. Will you please add some more details regarding this. Thanks!