Google has produced a video that gives some idea of the ever-changing algorithms it now uses instead of page rank. What does it tell us?
It is almost legend that that Google was founded by two computer scientists, Larry Page and Sergey Brin of course, who invented the page rank algorithm. It is this algorithm that made Google different from every other search engine before and since and it is a deep, clever and mathematical idea.
But over time the web has grown and computing the page rank has become a huge task. In addition the sophistication of the web and the users trying to subvert search has made the simple use of page rank less and less effective.
Google has always supplemented its search methods by additional algorithms designed to augment the raw page rank, but over the years it has increasingly de-emphasized page rank to the point where now it strongly promotes the idea that you should basically ignore it.
You may still be quoting the page rank of your web site, and even employing SEO to try and increase it, but the message is that page rank is dead.
So what does Google use?
Of course Google isn't going to say because in the search game secrecy is the best way to not only keep your competitors in the dark (Bing are you listening) but to make it more difficult for users wanting to game the system.
In a moment of uncharacteristic openness, Google has released a video which outlines how it all works.
Well not really, the video still doesn't give very much away, but the whole tone seems to have changed from page rank to "signals". The idea is that ranking engineers look at poorly performing searches and then come up with a hypothesis about what "signals" could be added to the selection algorithm that might improve it. Then they test the hypothesis and if it works - the Google algorithm is tweaked.
So - no big idea, no fundamental algorithm, no mathematical theory. You guess what makes it better. You test the guess and incorporate it if its correct. The idea that Google changes its search algorithm nearly every day (a claimed 500 improvements each year) now seems so much more believable.
Google's problem is essentially an AI problem. If the search engine understands the query and understands all the websites it examines then it could deliver relevant results. The page rank algorithm was a way of finding out what humans thought of the content of a website and that could be used to determine relevancy. Now that this no longer works, the technique seems to be to search for signals other than page rank that correlate with relevancy - and if you take what is being said at face value then this is a dangerous approach.
As any AI researcher who has tried this sort of approach knows, the result is an ever-increasing mess of rules that often contradict each other and slowly grow to the point where the system becomes unmanageable. Let's hope that Google has a clever system that keeps its search algorithm clean and under control, otherwise it might just vanish in a puff of complexity.
PHP is the focus of this week's round up of interesting posts from external blogs. Among the mix you'll find guidelines to reduce complexity, a comparison with HHVM and an excursion into Docker. We st [ ... ]