Examination of COVID-19 proliferation data using first-order differences. And what came of it

Hi Habr. The idea came up with the idea of ​​analyzing data using differences. The method is not new, but the essence of the idea is to investigate not the absolute values ​​of the distribution data, but the share of the agent (country) in the total ensemble of agents (all countries). And the behavior of this share in the process of the epidemic.

In Figure 1, he presented all the studied points (almost 24,000, data from the European Center for Disease Prevention and Control ), so it is a little muddy, and highlighted approximation lines for those countries that clearly show their character with regression equations and the coefficient R ^ 2 .


Fig. 1.

UFO Care Minute


COVID-19 — , SARS-CoV-2 (2019-nCoV). — , /, .



, .

, , .

: |

In the figures, under the spoiler, he presented data on selected countries in two versions, when the change in the share of the agent and the actual data on the growth of infected people are examined. With a little chart analysis. In general, it can be said that the difference method in this interpretation can work as an auxiliary leading indicator of the pandemic development processes, something like indicators in the technical analysis of exchange rates.

Graphs

. 2.


. 3.


. 4.


. 5.


. 6.

Theoretical basis


I will present the initial information about the mechanism of the indicator at the beginning with a simple example of an actual example.

Take a local group of three countries (Russia, Iran, USA) for the period of April 22, 23 (Figure 7).

1a) In Iran, the number of infected people as of 04/22/2020 was 84,802.
1b) In Iran, the number of infected people on April 23, 2020 was 85,996.
2a) In Russia, as of April 22, 2020, the number of infected people amounted to 52,763 people.
2b) In Russia, the number of people infected as of April 23, 2020 was 57,999.
3a) In the United States, the number of people infected as of April 22, 2020 was 825041 people.
3b) In the USA, the number of infected people as of 04/23/2020 was 842629 people.
4a) The total number of infected people, in an ensemble from three countries, as of 04.22.2020 - 962606 people.
4b) The total number of infected people, in an ensemble from three countries, as of 04/23/2020 - 986624 people.


Fig. 7.

Mathematical substantiation.

Denote the total number of infected at the step (as of date) i - Ni.
Denote the total number of people infected in country j at the date i = Mji.
Then the function under study has the form:
Fji = Mji / Ni The
increment of the dFji function has the form:



This function has an important balance property, which is that the sum of all the differences at each step (on each date) is 0. Further, the mathematical justification.



The second consequence of this law of balance is that the sum of all the differences throughout the entire process of development and the life of the epidemic is also zero. The math is below.



These differences have three states:

A) Less than zero;
B) is 0;
C) More than zero.

Their interpretation follows the standard rules for the study of functions and, here, I will not overload these aspects.

Consider the behavior of a graph of a function at infinity. We recall that modern principles say that today we are not able to eradicate the virus, but we can only try to bring the incidence for this reason to an acceptable level. That is, somewhere in the future there will be a state of equilibrium according to the conditions:
Mji + 1 = Mji + dj

That is, to arithmetic progression, then assuming that the growth (alpha) of the total number of infected people is more than 1, we get:



This is clearly visible in the graph for China.


Fig. 12.

From the foregoing, the following property is formed. That this model can be stable in the presence of local bursts in one or part of agents (countries).

We reason as follows. In the process of developing a pandemic, each country will eventually enter the stage when subsequent differences approach zero from the negative side. The number of these countries will increase and, ideally, approach the number of k-1. But there can be no more of this, since the balance equation must be observed. At k-1, the total sum of the differences at the i-th step will be less than zero. And then the k-th country should have a difference value greater than zero, so that the final balance is zero. That is a surge. At step i + 1, the k-th country reduces its difference and it moves on the graph to the negative half-plane. But this is only possible if there is a surge in one or more countries that were previously in the negative zone. This is what we all see in seasonal outbreaks of flu,which must obey the same laws.

Assessing the complexity of the task, the first thing that comes to mind is the “three-body problem”, but there are 206 here. It is theoretically possible, but not clear, which system of differential equations will have to be solved. But on the other hand, the system of differential equations implies initial parameters, and we already have a lot of such parameters. Given the fact that the range of values ​​of functions is from -1 to +1 and the system of diffours implies many dead zones. According to the constructed model, the balance amount due to errors in the calculations diverged from zero by 1 * 10 ^ -17. That is, the range of the studied values ​​is 2 * 10 ^ 17. I suppose that such conditions make it possible to design and train a neural network, which may be faster. Fortunately, the model is scaled by city for each country; as a result, training samples can be found enough.

Well, a little fly in this model.

When I looked at the balance of agents, I found that the accumulated differences behave as follows, as in the figure below for China.


Fig. 13.

The figure shows that China takes on all the negative mass. Excluding China received a similar schedule, but Thailand took over the negative mass. My hypothesis about this property is as follows. So far, the number of agents (countries) unchanged model reflects internal processes. At the stage when a new agent is added (that is, an infected one was detected in another country), the system captures the last state of the previous stage and this becomes the initial parameters for the next.

Summarizing in general, it can be assumed that this model can be used as a leading indicator of the spread of a pandemic and similar processes, such as the distribution of certain products, especially on the Internet. At an intuitive level, he put forward a hypothesis for himself that some indicators of technical analysis could be corrected. I will also consider the hypothesis of clarifying the method for determining volatility when determining the option price, there is one unclear point where the interval of historical values ​​for determining volatility is determined.

All Articles