Saturday, July 01, 2006

weird google

incidentally, in my endeavours to find the ISG stuff that i mentioned in the previous post, i stumbled across a few odd things i've never seen before re google.

Here's what happened. I googled "resigned from the iraq survey group" - and i got 55 results.
among those result were these:

Hansard is the official transcript of Australian Parliament. In any case, so i click on the bottom link but the term I was looking for wasnt in the document. No problem. I then clicked on the cached link and something weird happened. The site loads up normally - and you can see the document in the usual cached format - but as soon as it is fully loaded, the site redirects to http://72.14.235.104/webhp - which is a google homepage - to be precise "Google English" (which is different to google australia, and google uk. )
I have no idea what Google English is.

Has anyone ever seen a redirect from a cached document before?

OK. So that was kinda weird - but then I did the same search again 10 minutes later - and this time I only got 54 hits
and yep - guess which one was missing. the same one that i'd been trying to access. it's kinda odd, right?

After I did that first search and realised that the search term i was looking for wasnt actually in the document I was looking for, I grabbed another quote from the google excerpt ("in the hunt for weapon stockpiles, saying that resources had been") and googled that. I got two results - the Hansard pdf and the same other link.

(i dont know what that "9.43pm" timestamp means - it's probably the pacific coastal time i accessed that site)

10 minutes later, I did the exact same search - and only got one result - with the same one missing!

does anyone else think that is weird?

9 comments:

Anonymous said...

that IS weird.

have you tried your search terms on any other country's google (besides Google UK, Google Australia & Google English)?

The time stamp is just the last time you accessed the site. You don't get that feature on Google Australia?

- Jiminy Cricket

lukery said...

jc - always good to see you.

actually - i dont even go to any particular google site usually. in my browser i just type "g (search term)" - i'm not sure if google defaults to any particular TLD (the results are always 'google.com.xyz)

re the timestamp thing - i've never noticed it before - but that doesnt mean much...

lukery said...

searching via the different googles gave me the same result

Anonymous said...

Luke,

does anyone else think that is weird?

There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don't know. But there are also unknown unknowns. There are things we don't know we don't know. - Donald Rumsfeld

Anyway, this is the one you want.

lukery said...

thnx simon - yeah - i know that doc - but i dont know if that was the original version. given that google kept kicking me in the head when i tried to get that original doc - i wonder if the 'source' doc is indeed an 'original' (i cant even compare it to a cached version)

again - im not necessarily implying that there's something spooky going on - and i dont even have a particular interest in that specific document - but it was just odd that google did something weird.

(unless you think i should look more closely at that particular doc)

Anonymous said...

Problem solved.

The .gov.au-document that you tried to pull from Google cache contais redirection command to another document in .gov.au. The command doesnt take into account the possibility that it's used from other site than .gov.au, and now that it's fired from google url, it redirects user to google, which in turn gets confused about weird search parameters (meant for .gov.au), and redirects user to main search page.

To simplify a bit...

What should happen:

1) user loads gov.au/doc1.html
2) site replaces doc part with doc1.abc
3) user gets gov.au/doc1.abc

What happens with Google cache:

1) user loads Google/search=gov.au/doc1.html
2) site replaces doc part with doc1.abc
3) your opera tries to get Google/doc1.abc
4) Google sees bad url and directs you to front page


I presume that pages disappear from cache because upon your request Google checks if the pages still exist, sees that they do not and erases them from cache. This erasing doesnt immediately update results of other search terms, though...

(I got the Google-cached pages saved, too, if you wish to see them...)

lukery said...

thnx teemu.

i presumed that there's be a reasonable explanation - it also was kinda weird at the same time.

do you have any explanation for how the particular 'search' site disappeared from the google results?

again - i dont think there is anything particularly nefarious going on - it's more just a curious set of events from a tech perspective. (i have no doubt that im being spied on - but this particular set of circumstances just seemed really weird)

i presume that things are as you say - but i've never seen a static, cached document disappear like that.

if i was paranoid...

Anonymous said...

I have no detailed knowledge of innards of Google tech, but my guess is that it goes like this:

- Google robot indexes and caches sitesearch.aph.gov.au, including this temporary copy of Hamsard.

- Later, aph.gov.au moves some of it's pages and code around, rendering some links useless. (From this point on, Google references to non-existing page.)

- Even later, you find the page with Google. Direct link doesn't work (see above), so you pull the cached copy. Two things happen: (1) badly constructed function in original page throws you into Google main page, and (2) Google decides to check whether the page on cache is updated. As aph.gov.au tells Google that page doesn't exists any more, Google marks it as outdated.

- Then, you re-search Google with same parameters. Google displays all the pages except the one just marked ad outdated.

- Then, you change search parameters a bit. As Google probably has very large and complicated data storage structure, it often isn't in sync with itself, and therefore displays the oudated page in search results. Repeat from step 3.

lukery said...

thnx teemu - you are a wealth of information!