The Myth of Search

We all rely on search engines, believing that if information is on the Internet a search will discover it - but is it really so?



We all rely on search engines, believing that if information is on the Internet a search will discover it - but is it really so?


Suppose for a moment that you have solved some really important problem, one that the entire world would be interested in if only they knew about it. Eager to share your new knowledge as widely as possible you set up a web page not only to publicise the new information but actually to supply the answer for free. You include on the page "the answer to problem Y is X". Now you just sit back and wait for people to come and find the solution. Only they don't.


The reason is that we believe that if an answer X exists then a search engine will find it. This simply isn't true. Search engines are more complex and temperamental than the simple model of "seek and ye shall find". Their true behaviour is more like "seek and ye shall find - if lots of other people have already found and thought it important". In the early days this approach worked reasonably well, today is less certain. Search engines work by building a dictionary of key words in each web page - but they don't return results just based on key words. They rank the results according to how important they are. A page that matches your search specification could be listed behind thousands of pages that match less well but are deemed more important. In the early days of the web page importance was judged by humans - Yahoo! for example was a directory of pages put together by humans.


The key innovation, made by the creators of Google, in search engine design was the idea of page rank. This avoided the problem of having to use a human to rate a page by… using lots of humans to rate a page. The importance of a page was judged, and still is, by the way the Internet treats it. It depends on how many links there are to it from other sites and these links are weighted by the importance of the site linking to the page. Essentially the page rank idea simply detects the way humans regard the page. A good idea but you might notice that there is a problem.

Something is circular, but unlike the wheel it no longer goes around.


Page rank is a great idea as long as the web has some way of getting things started. A page that is 100% new and 100% important has no page rank. How is it going to get page rank? By being found. But without page rank it isn't going to be found. You see the problem.


The wonderful, world-important, solution to life, the universe and everything is going to remain a secret for all time and yet be 100% on display in the public domain. What is worse is that its rumoured - remember how a search engine works is kept secret - that Google for example doesn't take a web site seriously until it has reached a "certain age". That is young web sites don't get a high page rank.


Search engines based on page rank work only after a page has been discovered by "other means" and with the increasing size of the web and with users hostile to direct advertising - it's always spam even when relevant - there are no "other means" apart from luck.


When Google first crashed onto the scene changing everything, page rank wasn't circular because the "other means" were the only means and humans were doing the finding and the ranking simply by being interested. We still are doing the ranking only not as efficiently because we don't have the tools for the size of the job.


So the next time you search for "important answer" and read results such as

"We have important answer at bargain prices"

"For books on important answer"

"Be the first to review important answer"

"Try ebay for important answer"

and so on..

remember that this doesn't mean that there aren't millions of pages actually telling you the important answer. It just means no one has found them yet.


Read More from:





Last Updated ( Thursday, 11 February 2010 )