hunter paulson
how to build the fastest wikipedia race
solver on the internet
2026-04-30
|
||||
| blog | projects | github | ||
Every1 wikipedia page has links to other wikipedia pages and by clicking on one of those links we end up on that page.
Because of this Wikipedia can be thought of as a graph where each page is a node and each link is an edge2.
We traverse the graph by clicking on a link to another page. If we click on the right sequence of links we can get from one page to almost any other page. - footnote: wikipedia is not a fully connected graph so not every page is reachable from every other page.
There is actually a semi-official game called Wikiracing where “players compete to navigate from one Wikipedia page to another (chosen manually or randomly) using only the internal links of the page”.
There are two variants of Wikiracing. One minimizes the number of clicks and the the other minimizes the time to complete the race. I thought it would be a fun challenge to see if I could build a system that is the fastest in the internet at both.
Here is how it works… TODO: should the below section be hook? So here is how I built fastest wikipedia race solver on the internet3. You can play with my solver here and try overload it.
To start we need an algorithm to find the shortest path was between any two randomly selected pages on Wikipedia.
The only information we start with is the page we are initially on: the start page (S). and the page we want to get to: the target page (T).
I like to visualize this process kinda as two expanding solar systems centered on the start and target page that grow until they overlap. Every step of the algorithm we add an new further out orbit to one of the solar systems. As soon as the two outermost orbits intersect we know that we have found the shortest path(s).
As you can see the algorithm is pretty straight forward and brute force.
Initially we only know about the start and target page. And we have no idea how far apart they are.
We begin by checking either all the outgoing links for the start page or all the incoming links for the target page. In order to guarantee we find the shortest path we have no choice but to check every outgoing and incoming link for every page we encounter from here on out.
Since we got all the incoming links to the target page in the previous step we now need to get all the outgoing links from the start page. Both of our solar systems have their first orbital ring. We call the outer ring of pages that are all exactly K links away from the center node the frontier.
Now we need to decide which frontier to grow next. Our goal is to minimize the number of pages we have to check so we expand the frontier with the fewest pages4. Since our target frontier has 6 pages and our start frontier has 8 we expand the target frontier. Notice that this is exactly what we just did in step 3 when start frontier had one page.
Repeat the previous step until our frontiers intersect. We iteratively continue this process until we encounter a page that is part of the other solar system.
Once our frontiers intersect at at least one page we know that we have found the shortest path between the start and target page.
As you can see we had to check a lot of pages that were not on the shortest path. This wasted computation is unavoidable since we have no way of knowing ex ante which pages will be on the shortest path.
I first learned about this approach through Six Degrees of Wikipedia. They were an inspiration and baseline for this work. One of the reasons I built this was to measure if the six degrees of separation hypothesis is empirically true.
Unfortunately it turns out that, at least for Wikipedia, it is false. The diameter of Wikipedia is 415.