Type anything into a search engine and it hands you the best pages first, out of billions. How does it know which are best? One of the oldest tricks is to treat every link from one page to another as a vote. A page with lots of links pointing at it must be worth something. But here is the clever bit: a vote from an important page counts for far more than a vote from a page nobody visits. So importance flows around the web like water finding its level, with the big sites passing some of their importance on to whatever they link to. Do this over and over and the scores stop changing, and the page that all the important pages point to ends up on top. That is the one you see first. In the simulator, run the votes and watch the ranks settle, then add a link from a strong page and see its target jump.
Most people think the page with the most links, or the most keywords, automatically ranks first. In fact it is the quality of the linking pages that matters: importance flows along links, so a few strong votes beat many weak ones.
What's actually happening
In the late 1990s the web was exploding and the early search engines were drowning in it. Most ranked pages mainly by how often your search words appeared on them, which was easy to game: stuff a page with repeated keywords and you could shove it to the top, relevant or not. The results were a mess. Two Stanford students, Larry Page and Sergey Brin, had a different idea, and it came from an unlikely place: the way academics cite each other's papers. A paper that many others cite is probably important, and a citation from a landmark paper means more than one from an obscure note. What if web pages worked the same way, with links playing the role of citations?
That idea became PageRank. Picture importance as a fluid that flows along links. Every page is given a starting score and then, in each round, hands its score out evenly to all the pages it links to. A page that is linked to by many others, especially by others that themselves carry a lot of score, accumulates a high rank. Crucially, this is circular in the best way: your importance depends on the importance of the pages linking to you, whose importance depends in turn on theirs, and so on across the whole web. You solve the tangle by simply repeating the flow, round after round, until the numbers stop changing and settle into a stable pattern. That settling point is the ranking.
There is one wrinkle the inventors had to handle. A naive version can get stuck or leak all its score into dead ends, so PageRank adds a damping factor of 0.85. The picture is a random web surfer who, at each page, follows one of its links 85 percent of the time and, the other 15 percent, gives up and jumps to a completely random page. That occasional jump stops the flow getting trapped and guarantees the scores converge. The result was strikingly better than keyword counting, hard to fake by simply repeating words, and it powered Google past its rivals. Search today layers in hundreds of other signals, from how fresh a page is to what you have clicked before, but the founding insight still rings true: the web votes with its links, and the votes of the important count for more.
A search engine ranks pages by letting importance flow along links, so the page that important pages point to rises to the top.
- 1Draw four boxes labelled A, B, C and D and add a few arrows between them showing which links to which.
- 2Give each box a score of 1, then in each round split every box’s score evenly among the boxes its arrows point to, and add up what each receives.
- 3Repeat for several rounds and watch the scores settle, with the box that the well-linked boxes point to climbing to the top, just as a search engine would rank it.
Common questions
Because importance flows along links. A page passes its own score to the pages it links to, so a link from a highly ranked page carries far more weight than a link from an obscure one, just like a citation from a famous paper.
It models a surfer who follows a link 85 percent of the time and otherwise jumps to a random page. That occasional 15 percent jump stops the score flow getting trapped in dead ends and guarantees the calculation settles.
It is part of it. Modern search blends link analysis like PageRank with hundreds of other signals, such as page freshness, content quality and your own click history, but importance flowing along links remains a founding idea.