🚳 🐊 🔸 How we selected cargo for carriers 🐆 💌 👍🏾

Good afternoon. Our name is Ilya Bashtanov (developer, Tochka-Tochka ) and Tatyana Voronova (data analyst, 2M Center ). And we want to talk about the technical implementation of the task of selecting goods for transportation.

The essence of the problem is as follows. There are goods in the warehouse that need to be transported from city A to city B. We can assume that only the weight of the goods is taken into account, and their sizes are more or less standard (euro pallets). A carrier who wants to take a passing freight wants to carry as much as possible, but is limited by the weight and number of packages. It is necessary to form for him several variants of the lots from the goods available at the warehouse.

Solved tasks for business in this case:

Load vehicles as efficiently as possible and thereby increase transportation revenue.
To solve the delivery problem in an acceptable time for the user (including the FIFO principle).

The project immediately seemed attractive. First, from the point of view of mathematics: it was proposed to implement an algorithm for solving the classical combinatorial optimization problem. Secondly, PostgreSQL was proposed as a DBMS, the popularity of which has been constantly growing in recent years due to a large number of features, reliability, and good performance characteristics. And, of course, the importance of the practical task and the scale of the project, which involved many different participants, also became a great incentive.

Algorithm

This task is a variant of the classic backpack packing problem . The backpack problem is NP-complete . Such problems cannot be solved in polynomial time, although mathematically this has not yet been proven. This means that exact algorithms, such as exhaustive search of variants or its optimized varieties, do not work in practice, as they have complexity

O (2^{n})

$O(2^n)$ . Approximate methods provide solutions for the best time. For example, the greedy algorithm assumes that the maximum weight is put in the backpack as long as possible. Its complexity does not exceed the complexity of sorting

(O (n * l o g n))

$(O(n * log n))$ , but the result may be far from optimum. There is still a class of genetic algorithms that give good results in a limited time, but they are non-deterministic, which in our case would lead to different options for issuing during repeated calls, it would be difficult to explain to the customer and carriers. As a result, the choice fell on approximate methods that give the result with guaranteed accuracy in polynomial time.

So we have:

Weights with weights $w_1,w_2,... , w_n$
Limit on the total weight of goods $W_\max$
Cargo limit $Q_\max$

It is required to find a subset of goods with a maximum total weight that satisfies the constraints.

The idea is to reduce the variety of goods by coarsening their weights and apply the greedy algorithm. In this case, the approximate solution is found in a completely polynomial time, that is, a solution with guaranteed accuracy

1 - ε

$1- \varepsilon$ obtained with complexity being a polynomial in

n

$n$ and

ε

$\varepsilon$ .

At the input of the algorithm, we have a plate containing the rounded weights of the goods and the number of seats for each weight. Using a greedy algorithm, we try to get styling options for the total weights

W_{max},

$W_\max,$

W_{max} - 1,

$W_\max-1,$

W_{max} - 2,

$W_\max-2,$

\dots,

$…,$

0

$0$ . To do this, having set the target total weight, in the cycle we select the maximum weight cargo from the remaining loads so as not to exceed the target total weight. If the maximum allowable amount is reached

Q_{max}

$Q_\max$ , the cycle ends. The resulting combinations are stored in a temporary array.

The number of possible combinations can be large, and it is convenient for the user to choose from a small number of options, but at the same time there should still be a choice. An additional task arises of choosing preferred combinations. In order not to complicate, we decided to always give out two options, if possible. For a warehouse, it is advisable first of all to send the heaviest loads, so we rank the combinations in descending order of the largest weight of the cargo. If the combinations with the maximum maximum weight are more than two, we give out two: with the largest and least quantity of goods.

Tests

For testing, we selected two implementations of the considered algorithm, which differ in insignificant details: one in Java, the other in PL / pgSQL, the procedural language of the PostgreSQL DBMS. It should be noted right away that the final choice was influenced not only by the test results, but also by architectural considerations, usability and other factors. But first, the task was to choose one of two implementations.

Two test environments were used: a desktop for testing development and a server for testing in conditions similar to real ones.

dev: HP Probook 4740s client station + 32-bit 2 x Pentium® Dual-Core CPU E5300 @ 2.60GHz and 4 GB RAM, Ubuntu Linux 16.04, PostgreSQL 10.3.
test: 64-bit 2 x Intel Xeon CPU E5-2673 v4 @ 2.30GHz 14 , CentOS Linux 7.4, PostgreSQL 10.3

A test table was prepared in the PostgreSQL database containing 7000 loads with random weights from 20 to 800 kg. For testing, we used the standard pgBench utility from PostgreSQL, which during the test performed 500 transactions (10 connections of 50 transactions each). Each transaction made one call of the algorithm with random parameters (restriction on weight from 10 to 1000 kg and on the number of goods from 1 to 50 pieces). All random variables are distributed evenly.

For the first algorithm, each transaction started a Java machine and passed a JAR archive with the algorithm code to it for execution. The main Java class connected to the database through the JDBC driver, received test data from the table and applied the algorithm to them.

For the second algorithm, each transaction made a call to the stored function of the PostgreSQL database, which read test data from the table and applied the algorithm.

Table 1. Results of the main test

In addition to the main test, the results of which are shown in table 1, an additional test of algorithm 2 was carried out, in which different options for obtaining the initial data were compared: from a database, from a file, random array generation. It turned out that for 7000 loads, transferring data from a DBMS to a local array requires more time than the actual stacking algorithm.

Findings.

In our task, the performance depended more on the architecture of the system and the speed of interaction with the database than on the algorithm used. This is confirmed by an additional test of algorithm 1, in which the selection of data from the database took the bulk of the time.
Option 2 scales slightly better, apparently due to more modest RAM requirements (temporary database tables are used to store intermediate results).

As a result, the second algorithm was chosen, which, with minor alterations, was used for the second year. As a result, a solution is sought with acceptable accuracy with an average transaction time of less than 1 second.

Exploitation

As always, life made adjustments, and I had to tweak the implementation.

In reality, the carrier reserves the consignment in a Yankee auction with a floating price. Strictly speaking, this option of selling by auction is not, but this is the topic of a separate conversation at another site. In addition to the maximum cargo weight and maximum quantity, the carrier sets two cities (beginning and end of the route) and the desired loading time. After that, the selection procedure for each pair of warehouses (in the sending and destination cities) filters out the cargo according to several criteria, in particular, taking into account the warehouse’s working hours and the maximum transportation cost at which expenses are still covered by the sender’s payment, and transfers their weights to the stacking algorithm. At the output of the algorithm, a list of weights is obtained, according to which sets of specific goods are selected, which are issued as auction offers.

Initially, weights were rounded up to values that are multiples of 10 kg. During operation, it became clear that accuracy noticeably suffers, and the results begin to contradict common sense, for example, in the presence of goods weighing 12 and 19 kg, the system can choose the first. Warehouse scales give data with an accuracy of 1 kg, and for the current volumes and load, the performance of the algorithm for integer values of the scales turned out to be quite acceptable, so they have refused rounding. In the future, with an increase in the number of shipments, it is planned to use rounding to values that are multiples of 5 kg.

The overhead for the operation of the stacking algorithm with the current data volume and query intensity is not critical. Significantly more resources are required at the stage of cargo selection, as well as for other system operation processes.

Business findings

The mission of the Point-Point is an effective logistic process, in particular, the optimal use of the vehicle in terms of congestion: when transporting a minimum of voids.

The global goals of solving optimization problems in freight transport: saving resources contributes to economic growth, creates additional opportunities for small and medium-sized businesses, and improves the environment.

Specialists of Center 2M and Tochka-Tochki found a software-mathematical solution suitable for all participants in the transportation process. It can be used for the federal network, at each point of which 7 thousand pallets (corresponding to the size of the football field) are requested by 500 carriers and with the result obtained in less than 1 second.

Authors of the article: Ilya Bashtanov (i-bash), Tatyana Voronova (tvoronova)

How we selected cargo for carriers

Algorithm

Tests

Exploitation

Business findings

More articles: