Test Of Time Award For Classic Paper On Bow-Tie Structure Of The Web
Written by Sue Gee
Friday, 14 April 2017
Researchers whose work revealed the "bow-tie" structure of the web, thereby improving search engines, have been awarded the third Seoul Test of Time Award at this month's World Wide Web Conference, held in Perth, Australia.
Dame Wendy Hall, Chair WWW2017, Perth; Professor Chin-Wan Chung, Co-Chair of WWW2014 Seoul; Andrei Broder, Ravi Kumar and Andrew Tomkins.
The Seoul Test of Time Award, set up and funded by the computer scientists who organized the 2014 World Wide Web Conference held in Seoul, is given to the authors of a previous World Wide Web conference paper that has demonstrated significant scientific, technical, or social impact over the years.
The paper that has won this year's award, “Graph Structure in the Web” by Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie Stata, Andrew Tomkins and Janet Wiener, was originally presented in 2000 at the 9th WWW conference in Amsterdam.
According to the Google Reseach blog post announcing the award:
At the time of publication, it received the Best Paper Award from the WWW conference, and in the following 17 years proved to be highly influential, accumulating over 3,500 citations.
While Google is claiming the credit as five of the authors now work for it, at the time Andrei Broder was with Altavista, where the experimental data was collected in 1999 and four of the authors including Ravi Kumar and Andrew Tomkins were with IBM.
The paper made two major contributions to the understanding of the structure of the Internet.
First, its large-scale experiments, showed that Web nodes are distributed according to a power-law. That is, the probability that the Web node has i incoming links is roughly proportional to 1 / i^2.1.
Second, unlike previous research that assumed the Web to be almost fully connected, it described a much more elaborate structure of the Web, which since then has been depicted with the iconic “bowtie” shape:
The paper described several characteristic classes of Web pages:
the strongly connected core component, where each page is reachable from any other page,
the so-called IN and OUT clusters, which only have unidirectional paths to or from the core,
tendrils dangling from the two clusters, and tubes connecting the clusters while bypassing the core, and finally
disconnected components, which are isolated from the rest of the graph.
Whereas the core component is fully connected and each node can be reached from any other node, Broder et al. discovered that as a whole the Web is much more loosely connected than previously believed, while the probability that any two given pages can be reached from one another is just under 1/4.
In less formal terms. the paper showed the the way web pages linked to each other fell into four roughly equal groups with pages at the web's core the knot in the bow-tie, were all connected to each other by a chain of links. Pages that only linked to the core, such as new pages, were one side of the bow-tie. On the other side were pages that provided no links to any other pages, like many corporate websites. The remainder of the web was not linked to the core.
Speaking to the press after the presentation, Andrew Tomkins said the results of the paper were a surprise to the researchers:
“We hadn’t realised that so much of the web was not readily accessible to web crawlers",
and went on to explain how the research revealed that web crawlers, the programs that constantly browse the web, following links from one page to another, to keep search engines up to date, that only started in one spot could never index everything.
Seventeen years after its publication, the paper is viewed as the seminal introduction to the “bow-tie” graph structure of the Web. The same structure has been found in many other large- scale graphs, ranging from Wikipedia citations to networks of interbank loans. It provides a mathematical foundation for Web crawling and indexing that is still in use, and its theoretical underpinnings are taught in Web search and data mining courses around the world.
Presenting the award, WWW conference chairwoman Dame Wendy Hall said:
“It’s impossible to overstate the real-world significance of this paper. It was the first large-scale empirical study of the dynamics of the Web; without it, the modern search engines we rely on would not be able to continuously improve their search results. It’s an honour for us to present the 2017 Seoul Test of Time Award to a team whose work has had such a great impact not only on the World Wide Web community, but on business, academia and society in general.”
It was the third occasion on which she has presented the award.
In 2015 the first one went to Sergey Brin and Larry Page, for “The Anatomy of a Large-Scale Hypertextual Web Search Engine”, originally presented at WWW1998 in Brisbane, and it was Andrei Broder who accepted in on their behalf, see Google Founders Win New Test-of-Time Award,
Last year the recipients were George Karypis, Joseph Konstan, John Riedl, and Badrul Sarwar for their paper “Item-Based Collaborative Filtering Recommendation Algorithms”, originally presented at the WWW2001 in Hong Kong.
Amazon announced a raft of improvements to its databases at its AWS Summit in San Francisco, ranging from accelerators for DynamoDB to the ability to query exabytes of unstructured data directly in S3 [ ... ]