How does Google search?
Written by Alex Armstrong   
Thursday, 25 August 2011

Google has produced a video that gives some idea of the ever-changing algorithms it now uses instead of page rank. What does it tell us?

It is almost legend that that Google was founded by two computer scientists, Larry Page and Sergey Brin of course, who invented the page rank algorithm. It is this algorithm that made Google different from every other search engine before and since and it is a deep, clever and mathematical idea.

But over time the web has grown and computing the page rank has become a huge task. In addition the sophistication of the web and the users trying to subvert search has made the simple use of page rank less and less effective.

Google has always supplemented its search methods by additional algorithms designed to augment the raw page rank, but over the years it has increasingly de-emphasized page rank to the point where now it strongly promotes the idea that you should basically ignore it.

You may still be quoting the page rank of your web site, and even employing SEO to try and increase it, but the message is that page rank is dead.

So what does Google use?

Of course Google isn't going to say because in the search game secrecy is the best way to not only keep your competitors in the dark (Bing are you listening) but to make it more difficult for users wanting to game the system.

In a moment of uncharacteristic openness, Google has released a video which outlines how it all works.

Well not really, the video still doesn't give very much away, but the whole tone seems to have changed from page rank to "signals". The idea is that ranking engineers look at poorly performing searches and then come up with a hypothesis about what "signals" could be added to the selection algorithm that might improve it. Then they test the hypothesis and if it works - the Google algorithm is tweaked.

So - no big idea, no fundamental algorithm, no mathematical theory. You guess what makes it better. You test the guess and incorporate it if its correct. The idea that Google changes its search algorithm nearly every day (a claimed 500 improvements each year) now seems so much more believable.

 

        

 

Google's problem is essentially an AI problem. If the search engine understands the query and understands all the websites it examines then it could deliver relevant results. The page rank algorithm was a way of finding out what humans thought of the content of a website and that could be used to determine relevancy. Now that this no longer works, the technique seems to be to search for signals other than page rank that correlate with relevancy - and if you take what is being said at face value then this is a dangerous approach.

As any AI researcher who has tried this sort of approach knows, the result is an ever-increasing mess of rules that often contradict each other and slowly grow to the point where the system becomes unmanageable. Let's hope that Google has a clever system that keeps its search algorithm clean and under control, otherwise it might just vanish in a puff of complexity.

 

google

Releated Articles

Search Engines

Failure of the Google Gold Standard

The Myth of Search

 

If you would like to be informed about new articles on I Programmer you can either follow us on Twitter or Facebook or you can subscribe to our weekly newsletter.

Banner


Neural Networks Describe What They See
19/11/2014

There has been an amazing growth in what neural networks can do and the next step is to put vision together with language to produce a network that can describe what it sees.



Computer Scientists Petition Supreme Court Over API Copyright
12/11/2014

The Electronic Frontier Foundation has filed a brief on behalf of 77 computer scientists urging the US Supreme Court of the United States to overturn a finding that APIs are copyrightable.


More News

Last Updated ( Wednesday, 27 June 2012 )
 
 

   
RSS feed of news items only
I Programmer News
Copyright © 2014 i-programmer.info. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.