📧 🙂 🐮 Hundreds of thousands of routes per second per core. Yandex.Routing Experience 🍲 👨🏿‍🏫 📩

A couple of weeks ago, Danya Tararukhin told on Habré how our service appeared, Yandex.Routing, and how it helps companies with logistics. By creating the platform, we solved several interesting problems, one of which is dedicated to today's post. I want to talk about route planning itself and the resources needed for this.

Finding the best route between multiple points is a classic discrete optimization problem. To solve it, you need to know the distances and travel times between all points. That is, to know the matrix of distances and times. Two years ago, a long matrix calculation was a very critical problem for us and blocked development. The search for the optimal solution with the known matrix took 10 minutes, but the calculation of all the cells of the matrix for large tasks (for several thousand orders) took hours.

To solve the problem with five thousand orders, you need to know the distances and travel times between all points. These are two matrixes of numbers with the dimension 5000x5000. We plan courier routes for the whole day, and in the morning the courier will get from point to point in one time, and in the evening - for another. So, you need to calculate the matrix of times and distances for each hour of the day. Not all hours of the day are unique, but cork time (morning and evening) needs to be covered well. Therefore, we came to a configuration with thirteen hour slices. In total, we need two cubes (times and distances) each 13x5000x5000. These are 325 million routes, calculated according to the real road graph, in which 165 million edges. Calculation of one route in the well-optimized algorithm of the Yandex.Maps team takes about 10 ms, for a total of 900 hours of calculations.Even when parallelized to 900 CPUs, you need to wait 1 hour. We could not start such a service, we needed a more suitable algorithm.

For further reading, it is useful to know Dijkstra's algorithm for finding the shortest path in a graph. It can be imagined as a “wave” emanating from the starting point of the route and going around the entire graph until the finish point is met. The running time of the algorithm is proportional to the edges of the graph, that is, the area covered by the wave:

Almost every candidate for an interview at the interview guesses to the first step of optimizing such a task: you can start the wave from two sides and end the search when the waves meet. The total area of two waves of half radius is less than one large.

The real road graph is quite structured, and this can be used. When you are looking for the shortest distance between Moscow and St. Petersburg, in classical Dijkstra you will be forced to spread the wave in a circle and sort through all the streets and alleys of Moscow, the cities and villages of Moscow Region, the streets of Tver and Novgorod. This is a huge amount of calculations, but you can prepare in advance and remember the optimal routes between cities (aka shortcuts) and not repeat them in runtime. Then, to find the route between two points in the hierarchical Dijkstra, you need to calculate the shortest distances to the desired shortcut. Since hierarchy levels may not be two, but 5-6, they dramatically reduce the search time.

The Card router team has implemented such optimizations for quite some time. It was they who made it possible to reach 10 ms to search for a route between two points. :) So for now, we have not come close to solving our problem.

Since the point-to-point search mode is already extremely optimized, we can optimize the calculation of the series in the matrix. A row is the distance from one point to all the others. While we are looking for the distance to the farthest point, we simultaneously calculate the distances to closer ones. Therefore, calculating the series is equivalent to calculating the distance to the farthest point.

We look at the time of calculating the series using this algorithm and recall that sequential calculation of 5000 routes would take about 5000 * 10 ms = 50 s:

The graph shows the calculation time of a row in a distance matrix of size 1 * N for different N (according to real data). It can be seen that the calculation of the row of size 1 * 5000 of interest to us fits into 1.3 seconds. A trend line has been added to the chart, which shows that the calculation time is growing slightly more slowly than linearly in N, the order of N ** 0.74

Already not bad! With this algorithm, we can calculate our cube in 13 * 5000 * 1.3 s = 84 500 s = almost 24 hours. It easily parallels in rows, and when using 50 CPUs, distances are calculated in half an hour. The complexity order of the cube calculation algorithm is O (N ** 1.74):

13 N*N 50 CPU ( 13*N/50). , 5000 , . 10 000, : .

In this form, two and a half years ago, we launched the first version of our API, which solves the logistics problem. Clients often complained about a long decision time, and they are easy to understand: you started a task to be solved, wait 1 hour, you get a solution and you understand that you forgot to fix the shift time with the driver, you correct it and it all starts all over again. Drivers begin to get nervous, because they risk getting into the morning rush hour, or even do not have time to deliver the order on time. It was necessary to do something. We didn’t want to “throw” the problem with iron: we were preparing for heavy loads, it would have required a lot of iron, and the purchase of servers is not happening at once.

A study of academic articles showed that, it turns out, there are algorithms with linear complexity for this task *! (In the article by reference, there is a large overview of all sorts of modern methods of Dijkstra's acceleration, including for the matrix case.) Calculating the matrix in linear time did not fit in my head. One of our developers volunteered to write a prototype, and this is what happened: The

time to calculate one matrix of size N * N on one CPU using the "fast matrix" algorithm. Complexity is obtained on the order of O (N ** 1,1). High Ns are knocked out of the trend line, since the generation of the answer and its downloading over the network are already more influencing the time.

115 seconds per 5000x5000 matrix using a single core and an almost linear dependence on N. Fiction has become a reality! The idea of the algorithm combines the two ideas described above: Dijkstra for series and hierarchical search. Obviously, starting to calculate the second row, at some point we will again go around the same area of the graph that we just went through, calculating the previous row. Therefore, let us memorize the shortest distances to all destinations at the nodes of the hierarchical graph. When we begin to calculate the next series, then, having reached such a node, we will immediately get almost all the distances to other points.

A year and a half ago, this allowed us to save half an hour of time with eLogistics and significantly reduce iron intake. Previously, for one large request, we needed 50 cores for half an hour, but now - 13 cores for 2 minutes. This is approximately 200,000 routes per second per core. That rare case when the new algorithm does not just close the class of problems, but expands our ideas about the possible.

_{^{* Article “Route Planning in Transportation Networks”, see paragraph 2.7.2 “Batched Shortest Paths”}}

Hundreds of thousands of routes per second per core. Yandex.Routing Experience

More articles: