Revealed: Google's manual for its unseen humans who rate the web |
|
|
Exclusive It's widely believed that Google search results are produced entirely by computer algorithms – in large part because Google would like this to be widely believed. But in fact a little-known group of home-worker humans plays a large part in the Google process. The way these raters go about their work has always been a mystery. Now, The Register has seen a copy of the guidelines Google issues to them.
The 160-page manual gives detailed advice for raters – on relevance, spamminess, and – more controversially – the elusive "quality". For relevance raters are advised to give a rating based on "Vital", "Useful", "Relevant", Slightly Relevant", "Off-Topic or Useless" or "Unratable".
Raters may also be asked to give a spam rating: "Not Spam", "Maybe Spam", "Spam", "Porn" and "Malicious".
Interestingly, raters are not advised to rate websites with out of date security certificates as Spam or Malicious. At the time the rating guide was written, the US army portal – for instance – currently used an out-of-date certificate.
Users are asked to second guess "user intent". "What was the user trying to accomplish when he typed this query?" asks the manual. Google classifies intentions into three categories: the first is "action intent" – a user wanting to "accomplish a goal or engage in an activity". Then there are what the Chocolate Factory calls "do queries" and navigational, or "go queries". They're not mutually exclusive, the guide stresses, and some are ambivalent: such as the search query "iPad".
Raters are advised to look for websites with content fresher than four months old – if it's older, it shouldn't be rated "Vital".
Much of this part of the guideline document is intended to cope with sites attempting to game Google. For example, this blog is cited as an example of "gibberish". Google's PageRank system was originally devised to rank authority according to popularity. This worked for academic papers, where frequently-cited documents, tended to be the most important. Other tweaks were then added. But the increasing popularity of weblogs in 2003 caused all kinds of problems for Google, as they gamed the PageRank algorithm so effectively: creating a rats nest of links.
Source: The Register
Read more: