Logistics. Introduction Just about complicated

We all love to dream, especially when it comes to visiting new places or returning to your favorite places. Nothing inspires so much as a sense of anticipation of a planned event and is overshadowed by only the presence of organizational issues, in particular, the selection and purchase of plane tickets. And for some reason, internally, we always put off this issue or shift it on tour operators, metasearch engines or other aggregators. In the practice of each, there were situations when a message was displayed on the screen: “Unfortunately, nothing was found for your request. There may be flights for other days. ”

As often happens, it is far from always in response to a request with a route search that the desired result with a flight is given, which immediately goes to checkout. We start moving from site to site to get a satisfying answer.

The occurrence of similar cases is associated with the lack of direct communication or pre-planned combined (connecting) routes, or routes with transfers were proposed that included waiting time for the next flight for more than 5-6 hours, or even a whole day.

How is it that with all the variety of airlines we get answers that are far from always satisfying us, and in general, how are the algorithms for finding route options constructed algorithmically? In this matter, applied mathematics comes to our aid using algorithms and evaluation criteria.


World Airports Map

I’ve been dealing with logistic problems for a long time and when I was faced with the task of building a search engine that will search for optimal routes, it seemed to me rather trivial. From experience, I could say in advance that it can be solved by fairly simple methods and in a relatively short time. However, if everything was so simple, then this article would not have appeared. Therefore, having joined the work, after a while I realized that it wasn’t so trivial.

The starting point, as often happens when traveling, and it was with this approach that I came up with, was the decision to build a transport network in the form of a graph. The transport network initially included data on airports, airlines, their flight schedules, and connecting time indicators.

It was taken as the basis that airports are considered as the top of the graph, and flights between them are edges. As a result, we got a skeleton, as in a skeleton, on the basis of which I began to impose a schedule that had its own characteristics: departure and arrival times. Moreover, movement along the edges is always possible only at a certain time, which is characterized by particular options for solving the problem:

  • minimum travel time;
  • minimum (or adequate in plain language) number of transplants;
  • time interval for transfer from one flight to another when choosing complex routes.

The difficulty lies in the fact that under no circumstances we have the use of only one particular case, the query always contains related search criteria, including one or two or more refinements. It follows that the application of criterion imposes certain conditions on the calculation of the desired route options. In addition to the three listed above, the calculation includes the use of categories of service classes (first, business, economy), cost, number of passengers and their categories, as well as the availability of a number of additional services that affect the result. Thus, this completely began to fit into my concept of solving the problem by means of multi-criteria optimization (multi-purpose optimization), namely, finding the shortest path by several criteria (MOSP - multi-objective shortest path).

But, as everyone knows, in order to test one hypothesis, a number of others must be cited. As another option for solving the problem, the use of nonlinear programming was considered.

Given that the arcs of the graph are weighted by vectors, the solution of the problem could be reduced to quadratic (nonlinear) programming. And everything would be perfect if it were not for the fact that the vectors, in fact, have only two characteristics: length and projection on the coordinate axes, which means the presence in the objective function of variables with a degree of no more than two. However, in my case, the components of the vectors “conflict” with each other, which contradicts non-linear (quadratic) programming. For example: “conflict” may be due to the fact that each flight has a specific aircraft capacity and it is impossible to sell more seats for it. Due to the presence of “conflicts”, the option to solve the problem by nonlinear programming had to be abandoned in favor of the same multi-criteria programming.

After analyzing the options for solving the problem, I went with the classic version of decomposition with the following parts:

  1. search based on the schedule of all “shortest” routes from the point of departure to the point of arrival;
  2. search among the "shortest" paths of the most "optimal" based on criteria.


At the same time, after the decomposition, the question logically followed, but on the basis of which approach is it better to build a transport route network? And which of the available graphs is more acceptable for solving this problem?

According to the classics of the genre, not one was found, but several models, which I began to consider.

Time-Extended Model


A time-extended graph is a directed graph. The arcs of such a graph are the directions of the departure and arrival points of flights, and the peaks are the slots (take-off and landing times) of flights. The use of the time-extended model is to expand the original graph with airports peaks and arcs, reflecting the relationship of routes by combining airports among themselves in accordance with the flight schedule:



Arcs are indicated aseBC,3. For example, an arc connects an airportB with airport $C$and directed from Bto Cspecified by comma index 3shows that this is not the only flight, but the third in a row. Moreover, each arc can be easily weighted, for example, the weight of the arceBC,3 can be calculated as wBC,3=tC,9tB,6. The vertices inside a single airport are easily arranged according to the time indicator of each flight, and also, based on the connecting time inside the airport, flights are connected by arcs that correspond to the available time of transfer from one flight to another.

The advantage of a time-expanded graph is that the fastest arrival path can be found in the shortest time, for example, using the Dijkstra algorithm. The disadvantage of using this approach is the multiple increase in vertices and edges, which, however, does not affect the sparseness of the graph, because an increase in the number of edges is proportional to an increase in the number of vertices.

Time dependent model


A time-dependent graph is a directed graph without additional transformations over the original graph. Each arc corresponds to a route with connections between airports, while the arc has a special data set that includes parameters about the time of departure and arrival of the flight along the route. The value retrieved from this dataset depends on the exact moment in time accessing this arc occurs, hence the name “time-dependent”.



The advantage of a time-dependent graph is that it allows you to find the path with the least number of transfers, the disadvantage is the excessive load on the data of each arc.

Multigraph


A multigraph is a directed graph with sets of characteristics inherent in time-enhanced and time-dependent models.



There are as many arcs between airports as during the extended model, but these arcs are “weighted” by the departure and arrival times, and the connecting time between flights at each airport will have to be constantly and repeatedly calculated. On the other hand, in a multi-graph representation there is exactly as much information about flights as during a time-dependent model, but all this information is assigned to many arcs. Finding the path with the least number of transfers will be as simple as during the time-dependent model, but passing along one arc does not take into account the information of other arcs, so after the path with the least number of transfers is found, you will have to repeat this path for calculation of information about other flights (arcs).

Representation of a transport route network in the form of a multigraph is a raw state for time-extended and time-dependent models. The search for the path with the shortest arrival will be performed by the same transformations characteristic of the time-expanded model, and the search for the path with the least number of transfers will be performed by the time-dependent model. And entails an increase in network calculation time. However, there is a nuance here, because the search for the “shortest” path according to two criteria, in the general case, already belongs to the NP class of difficult tasks and the modification of Dijkstra's algorithm performs quite complex manipulations with the marking of vertices.

Representation of the computational complexity of the problem with the multigraph of the model of the set of routes with airports and flights can be considered through the simplest modification method - Dijkstra's algorithm, which searches only for Pareto-optimal paths. The total amount of criteria for a single Pareto optimal path cannot be worse or better than other Pareto optimal paths for all criteria simultaneously. For example, a set of paths with the following sum of two criteria {(30, 10), (20, 20), (10, 30)} will be Pareto optimal.

As a more illustrative example, we consider the operation of the algorithm on a small graph whose arcs are weighted by two-dimensional vectors:



The operation of the modified Dijkstra's algorithm on such a graph, as well as the usual algorithm for a regular graph, will consist of iterations with successive striking out of the vertices, except that each vertex may have not one Pareto-optimal value, but several. In addition, it is also necessary to save information about the origin of each calculated value of the sum.

Given that the problem considers routes with airport connections for flights, it is quite appropriate to assume that there can be quite a lot of arcs between two vertices, and each of them can be weighted by at least a three-component vector. But with an increase in the components of the vectors, an increase in the Pareto optimal paths can also occur. Moreover, almost all arcs between two vertices can be Pareto-optimal. In this case, the sum of the paths will contain even more Pareto-optimal paths. This can be seen on a small multigraph weighted by two-dimensional vectors:



The use of the modified Dijkstra algorithm in such a model is permissible, since it can work on multigraphs, but the algorithm processing speed depends on the number of verticesn, and on the number of arcs m. For sparse simple graphs and the usual algorithm, the complexity is of the order ofO(nlogn+mlogn). However, it is difficult to say whether the resulting multigraph will be "sparse", since in the general case the edge density of multigraphs can be much higher than that of a full graph. If we take into account that the number of edges of a complete graphE determined by the number of its vertices Vaccording to the formula:

E=V(V1)2

And the edge density is defined as:

D=2EV(V1)

For complete graphs max(D)=1, but for multigraphs, the value of the edge density is generally unlimited.

Therefore, I came to the conclusion that the shortest path can be found using the Dijkstra algorithm, and all other shortest paths can be found using the Yen algorithm. Given that at this stage we operate with unweighted arcs and can immediately discard paths, for example with four transfers, then the search for one shortest path will be guaranteed to be performed in less thanO(n2).



Considering that this article does not cover the immediate results of implementation, I can assume in advance that the modified Dijkstra algorithm on a multigraph model will not work out taking into account the optimal time interval.

Therefore, I will move on to the immediate solution of the first subproblem, namely, to the solution of the question related to multicriteria optimization, which I selected by the method of conditional optimization. The method consists in the fact that despite the inconsistency of the criteria, the highest priority always exists, and all the others have valid values. As a result, optimization of the entire task comes down to optimization at the highest priority.

A feature of the task is the fact that the calculation includes not only direct flights, which always greatly simplifies the task, but, on the contrary, connecting flights or transfers. Therefore, the time-dependent graph model is very suitable for solving the problem, because often the number of transfers does not form more than 3-4 segments in the route. The time-dependent graph, first of all, carries information about the totality of available routes in the form of flights with connections between airports. Then, an estimate of the departure and arrival times for the docking is calculated taking into account the usual unweighted graph. All that needs to be done at this stage is to find all valid routes between airports “A” and “B” with a predetermined number of transfers, for example, three.



Considering that at this stage the graph is unweighted, it is possible to find the shortest path using the usual breadth-first search, and all other shortest paths using the Yen algorithm. Since the breadth-first search is performed in linear time, we can say that the search time for all paths, with the same 3 transfers, will be linear.

When evaluating the use of algorithms for finding optimal routes and the availability of data on the frequency of schedule updates, we can say that the idea of ​​a pre-calculated matrix with all the "shortest" paths in directions is justified. Of course, at the initial stage of creating the matrix, such a solution will turn out to be laborious calculations, but then, in the future, it will allow you to almost instantly find the “shortest” paths and promptly make adjustments to the matrix itself.

However, the question remains, and which “shortest” route routes with connections and transfers are the most “shortest”? Otherwise, we have an array of data that will not allow us to make a decision. A set of criteria is often reduced to the most optimal in this case, namely:

  • minimum travel time;
  • minimum number of transfers.


Both criteria can have custom intervals or time values, so it remains only to combine the routes and check on the fly the correspondence of the admissible values:



Considering that the maximum number of transfers varies in values ​​of 3-4, some combinations of flights will be immediately excluded , leading us to the solution of the following subproblem - to identify among the routes the most “optimal” criteria for the variety.

The criterion system is the main problem in the formation of the issuance of routes, because the concept of "optimal" will depend on the preferences of a particular passenger, and he often only upon receipt of an answer to the request can manage the received routes by setting a different set of filters. And even if you provide him with ordered routes in accordance with the “shortest” paths, the results will be an array of data that will not allow the passenger to decide and will lead to an endless search of options. Therefore, the formation of an array of route options that take into account various combinations of filter application, both collectively and individually, is an extremely important parameter that affects the entire task.

What then needs to be considered and how? Creating ranking filters is a very time-consuming task, because a set of filters, their number and the degree of influence on each other can significantly change the result of the search for optimal paths as a result of refinement of search algorithms. Then what can it consist of? We all know the key flight parameters - information on airlines, dates with departure / arrival times, departure / arrival airports, time zone offsets, connections and transfers, but we often forget about such data as the necessary and sufficient time to arrive at the airport or travel time between airports to make a connection in the case when the subsequent departure is from another point in the location. And we should not forget about the category of our passenger,as well as their quantity for the subsequent offer of not only the optimal route in time, but also in price, depending on the tariff policy.

An array of proposals should instill a certain degree of confidence in every traveler, no matter how paradoxical this may sound. Confidence based on mathematical principles and consisting in the provision of a minimum set of proposals with route options that take into account preferences and reduce the risk of force majeure. And in this case, force majeure is not only the likelihood of a flight delay or a change in time and airport of departure / arrival, but also indicators such as natural, technological and epidemiological situations that entail the complete or partial cancellation of flights.

The ability to manage data in our realities entails many questions each time: is everything taken into account, what else can be added. And in this desire to build a transport route network, a paradox arises related to the fact that the algorithm should be both optimal and simple at the same time. But you often have to sacrifice something, from here somewhere minimizing the initial criteria to the minimum amount seemed to me the most acceptable.

If we go along the path of selecting criteria, then omitting the secondary ones, we can distinguish such options that take into account not only the time of the entire journey, cost and comfort, the number of transfers and their duration of transfers, but also, say, information on the congestion of airports and data on communication between airports when changing one airport to another. This is justified by the fact that when forming connections, taking into account the presence of several airports in one city that have different communications with external airports, not only air traffic, but the time of movement when such a situation arises will be calculated in the calculation of time and route formation. And they have a place to be ...

The decision-making system in choosing the “optimal” routes in situations where we have a set of criteria sets that affect preferences is just that stumbling block that I did not think about at all at the initial stages. Even leading them to a certain systematization, there are quite a lot of differences in their application with each other, so we will consider them separately.

Comfort

The meaning of the concept of "comfort" is understood by everyone differently, but the following levels are accepted by airlines:

  • first grade;
  • Business Class;
  • Economy class.


In my approach, I admitted that the “comfort” parameter can be measured and take values ​​from 1 to 10, and formed the following condition: the lower this indicator for a particular flight, the less comfortable it will be in the ranking system. In this case, a problem immediately arises that is related to the non-additivity of such a quantity, i.e. the sum of the comfort indicators of the two flights will not reflect their overall comfort. For example, there are two route combinations:

  • the first route with two segments, the first raid with comfort 10, the second with comfort 1;
  • the second route with two segments, the first raid with comfort 6, the second with comfort 5.


As a result, the sum and arithmetic mean value of the “comfort” parameter of each path will be the same, but they will not reflect the true state of affairs. However, given the need to consider the aggregate comfort of the two routes, it is most advisable in this situation to use their average values ​​together with the standard deviation, which would allow us to estimate the “spread” of comfort levels. That is, at the initial consideration, we pay attention to average comfort, and then, by the magnitude of the standard deviation, we consider how these comforts differ greatly from each other. Thus, this criterion splits into two very “close” ones. Although, on the other hand, even without painstaking examination, it is clear that the assumption of such a “measurable” level of comfort is very far from reality,because we have well-known class levels. And secondly, it is also obvious that the comfort of the entire journey is in some kind of functional dependence on other criteria, for example, the duration of the flight, the number of transfers with their duration, the rating of the airline and its air fleet. Whether it makes sense to focus on this criterion is a question. After all, the approach is based on a system of accounting for the totality of “optimal” ways. Therefore, in the future we will use the comfort representation in the form of two indicators: the arithmetic mean of the comfort indicators of flights and their standard deviation.the number of transfers with their duration, the rating of the airline and its air fleet. Whether it makes sense to focus on this criterion is a question. After all, the approach is based on a system of accounting for the totality of “optimal” ways. Therefore, in the future we will use the comfort representation in the form of two indicators: the arithmetic mean of the comfort indicators of flights and their standard deviation.the number of transfers with their duration, the rating of the airline and its air fleet. Whether it makes sense to focus on this criterion is a question. After all, the approach is based on a system of accounting for the totality of “optimal” ways. Therefore, in the future we will use the comfort representation in the form of two indicators: the arithmetic mean of the comfort indicators of flights and their standard deviation.

Duration of transfers

There are two types of route coordination:

  • transfers (pre-agreed between airlines);
  • docking.


Of interest from this list is the concept of “connection”, which is the result of a combination of two or more airlines on the same route with agreed time periods, based on the minimum connecting time required, but not always sufficient to transfer from one flight to another. Therefore, to combine flights, in such a route, it is necessary to calculate the sufficient time required to complete the connection.

Here we have a dissonance, because often such flights can be excluded from the options for the offer, but, due to the presence of such a criterion, there are situations when connecting with a temporary wait of more than a day is “optimal” for our passenger. Therefore, it makes sense not to discard routes with too long transplant times, since, albeit rarely, they may still be in demand.

Airport

congestion The airport congestion indicator can be estimated when calculating the “optimal” routes with an assessment of the degree of their filling upon the sale of tickets for flights, taking into account the participation of an airport in the route. Obviously, in this assumption, the minimum connecting time will have some functional dependence on the congestion of the airport.

Number of transfers

The number of transplants per se is a contradictory amount. Indeed, it is often this way that you can reduce the cost of the entire route. On the other hand, there is a category of passengers for whom the presence of even one change creates a number of problems. Do not forget the situations when the transfer route is the most “optimal” due to the fact that expensive business class tickets are left on the desired route or, say, there are tickets with a fare without the possibility of exchange / return, which entails the probability of refusal the proposed option.

But in the number of transfers, another sub-criterion is laid down - this is the probability of moving from one airport to another. Therefore, the formation of connections is often a difficult task, because in calculating the route, it is necessary to lay the time for moving between airports, and this entails the need to order tickets for transport within the city for moving in subparagraphs.

The criteria “travel time” and “cost” are absolutely additive quantities, therefore they rely on direct indicators with a ranking system. And all would be fine if it were not for the need to include the total travel time with the calculation of the time for arrival at the airport or departure from it to the destination, as well as when connecting the accounting for the need for additional expenses for moving between airports. At this stage, let me neglect this clarification.

The hierarchical structure of the global goal


Suppose that the definition of “optimality” should be understood as the best always and for all, and not only for passengers, but also for airlines, airports and, possibly, other travel service providers.

In this case, dividing a single, very complex target into subgoals is the only way out. Such a breakdown may look like this:



It is quite obvious that in order to satisfy all areas of interest, one will have to achieve less “global”, but to some extent conflicting goals. How to achieve this? Perhaps this is a great topic for another article. But looking ahead, we can say that dividing a complex goal into simpler subgoals is also connected with optimization and graph theory, namely, the search for optimal hierarchical structures.

The decision-making system is rather closely connected with such methods as the criteria of Laplace, Wald, Savage and Hurwitz, therefore, having an array of “shortest” routes, I could not help knowing which of them would be the most “optimal” based on them.

One of the most universal approaches to solving multipurpose optimization problems is to search for optimal solutions according to Pareto and Slater. If we evaluate the array of “shortest” routes according to Slater’s optimality criterion, then the optimal path will be considered the one that is the best by all criteria at the same time. A path is considered Pareto optimal if all other paths are better than it in some one parameter, but worse in another, i.e. one cannot improve a path by at least one criterion without worsening it by the rest.

All Pareto and Slater optimality solutions can be easily demonstrated graphically for the minimization problem with two criteria: The



set of points with the parameters of the Pareto optimal criteria are indicated in green, and the Slater optimal ones are indicated in yellow. The set of Slater-optimal points also contains the set of Paretto-optimal points. Within the framework of the problem under consideration, all these points will be the ends of the vectors starting at the origin of the seven-dimensional space.

In conclusion, I can say that this is only the tip of the iceberg, which allows you to give an initial idea of ​​what search engines encounter when forming optimal routes.

All Articles