COVID-19: predicting the number of patients with coronavirus

Coronavirus finally captured the whole world - and this is not expressed in the fact that every inhabitant of the planet managed to get it. At the moment, this topic is the main and only one - both in world and in Russian news. In this article, we will try to abstract as much as possible from politics and arguments about whether the Chinese military launched the virus, or Donald Trump. Instead, we look at the problem from a mathematical point of view - namely, we will find out how we can describe the epidemic with one equation, and at the end of the article we will predict the total number of infected COVID-19 - including in Russia.



UFO Care Minute


COVID-19 β€” , SARS-CoV-2 (2019-nCoV). β€” , /, .



If you do not live in Russia, refer to similar sites in your country.

, , .

: |

:


Even a little less than a century ago, in 1927, two scientist guys Kermak and McKendrick in their article brought to the world the idea that the spread of the epidemic can be described mathematically. In the simplest case, when a population of N people are infected with some kind of virus, and in the population itself people can either be healthy ( S ) or get sick without the possibility of recovery ( I ), the equation of the proportion of infected population at time t will look like this:

i(t)=i0i0+(1βˆ’i0)eβˆ’Ξ²t,


Where i0represents the initial proportion of the infected of the total population, and Ξ²responsible for the spread of the epidemic - and it is thanks to this parameter that we can regulate the probability of transmission of the virus from person to person and zoom in / out the moment when the entire population becomes infected (because when tβ†’βˆž,i(t)β†’1)

It may seem to connoisseurs that the guys borrowed the equation of the logistic curve from the Belgian mathematician Verhulst - however, in this case this equation is nothing but the result of solving a system of differential equations (I will not go into mathematical jungle, but if anyone is interested, this explains everything epidemic theory, and here - as always, great visualization from Grant Sanderson).

The function graph is something like an elongated Latin letter s (which is probably why it is also sometimes called the s- curve):



And now, before proceeding directly to modeling, there is a small spoiler that this model is reflected in reality, and specifically in the case of COVID-19: look at the chart of detected cases of infection in Mainland China, taken from a project by Johns Hopkins University :



We derive our equation for COVID-19


The model presented above requires the fulfillment of a large number of prerequisites, such as a constant population, the possibility of contact of each with each, etc. In addition, with the introduction of an additional group ( R ) within the population that contains the recovered / dead (i.e., they are no longer able to become infected with the virus), the model is still very strongly tied to the initial number of infected.

In reality, it makes no sense to derive a model based on only one initial condition, since in each country (and even locality) its own characteristics of the spread of the virus - and the total number of infected can be very different in two regions withi0=1. Single ratioΞ² to help to i0it also turns out to be insufficient - too many factors influence the spread of the virus and bend / break the logistic curve in its own way.

In this regard, we propose to modify the equation for the number of confirmed cases of virus infection by maximizing its parameterization:

i(t)=ab+cβˆ—edβˆ’Ξ²t,


In this case, we also introduce the term in the exponent - and having 5 parameters instead of 2, we can certainly fine-tune the curve.

Selection of parameters


Due to the fact that we have data on detected cases of infection for the first ndays of a pandemic, and not just at the very beginning, i.e. set of values ​​of confirmed cases(y0,y1,…,yn), we can reduce the problem of finding optimal parameters (a,b,c,d,Ξ²)to the problem of minimizing the sum of squared deviations:

βˆ‘t=1n(ytβˆ’i(t))2β†’min


Simplifying the task as much as possible, we determined the final set of values ​​for each of the function parameters i(t), and, in fact, we optimized the hyper parameters by searching the grid for the aforementioned loss function.

Country forecasts


I want to emphasize that it is the number of confirmed cases of the virus that is predicted, and not the number of dead and recovered, because after infection the disease process proceeds very individually, and any prognosis will be extremely inaccurate if it is based on general statistics.

The data for the forecast is taken from the GitHub project of Johns Hopkins University. The predicted value of confirmed cases of the virus is given as a fraction of the country's population multiplied by 10,000 (multiplication is necessary so that the numbers do not turn out to be very small, otherwise the algorithm will simply predict zeros). Along the axisxthere is a number of days from the moment of registration of the first case of virus infection.

The graphs show the confirmed cases at the moment (Real Current Confirmed), the values ​​on the same dates, but predicted by the model (Predicted Current Confirmed), and the predicted values ​​for the next 30 days (Preficted Future Confirmed).

China


The virus was first detected in Wuhan, the capital of Hubei Province. As it turned out recently, the first case was not registered at the end of December 2019, as previously thought, but already on November 17 . This does not change the essence, and thanks to the clear actions of the Chinese government in the field of ensuring quarantine, by the end of February we managed to stop the local pandemic. However, we immediately make a reservation that the data for the model are available only from January 22, and by that time already 444 cases were registered.


Data on the first infected: 01/22/2020

Italy


The homeland of Paolo Sorrentino has become a European breeding ground for the virus - and this is due not only to Italy's popularity among Chinese tourists (true), but also to Italians' special love for washing their hands (joke).


Data on the first infected: 01/31/2020

Germany


Chancellor Angela Merkel attracted the attention of the world community with her statement that 70 percent of the country's population will eventually get coronavirus .


Data on the first infected: 01/27/2020

However, according to the forecast, a little more than 0.05% will be affected

Spain


The hot machos decided to keep up with their "climate colleagues" (Italians, of course) - and so far there is no reason to talk about the extinction of the virus spreading soon.
However, the Spaniards are not discouraged and come up with, perhaps, the most fascinating fakes around the coronavirus - recently the news slipped up that a brothel with 119 people was quarantined in Valencia , of which 86 were clients, due to the fact that one of the representatives coronavirus was detected in the most ancient profession - apparently, it coughed, and a doctor was found among the clients.


Data on the first infected: 02/01/2020

Russia


The situation around the coronavirus in our country is still unclear from the point of view of whether all cases have been recorded - otherwise how can one explain the sharp increase in cases of pneumonia that cannot be distinguished from coronavirus without additional research?

But much more interesting is how the virus spreads across the country. On a special page created by the operating center of Moscow, a list of flights is maintainedthat sick people arrived. That is, the virus, for the most part, entered the country together with our compatriots who were on vacation / working abroad. If we compare the average monthly salary in the country and the cost of an air ticket to Europe, it turns out that not the poorest people brought the virus with them. And then it’s time to turn to graph theory, namely to the concept of assortativeness, which means the presence within the social network (society) of preferences in connections (communication) - in other words, basically the rich communicate with the rich, and the poor with the poor. In total, it turns out that for Russia, coronavirus is a disease of wealthy people. Therefore, if you, my dear reader, are currently in the subway underpass near the Kazan station and right now you are killing a rat running by your boot, then perhaps you are part of the most risk-free group in our country.

However, do not rush to rejoice - because there is one nuance in the theory of assortative preferences. Imagine that the school has two groups of girls communicating with each other - beautiful and ugly. However, we all remember that the most beautiful girl has a ugly girlfriend - and now we got that through this connection a connection of two groups is formed.

Exactly on the same principle, a rich business woman who has returned from Italy with the virus may have a retired mother, whom she periodically visits, and she, in turn, goes to communicate with other pensioners in the yard - this is how the virus flows between strata of the population.


Data on the first infected: 01/31/2020

According to the forecasts of the model, Russia is still far from the inflection point, i.e. that moment in time after which the increase in the sick compared to the previous day will become less and less.

Summary


Of course, the presented model is very basic:

  • it does not take into account the fact that there is a change in the temperature background from winter to spring, and therefore the activity of the spread of the virus should fall after an increase in temperature
  • neglect of the closure of state borders, the establishment of more stringent quarantine measures within the country itself, and, as a result, a decrease in the intensity of contact between people with each other
  • : , ; grid search ; , EM- ..

However, if you are interested in this topic, we advise you to join the hottest COVID-19 Open Research Dataset Challenge (CORD-19) and solve problems: from identifying risk factors to creating a vaccine!

Also from today, we launch our bot in Telegram (@CoronavirusMonitorBot), in which we monitor current information around the situation with coronavirus. We recommend that you subscribe to keep abreast of how the situation will develop.

The main thing I want to say is that there is no need to panic. In such situations, observing basic hygiene rules and avoiding crowded areas will help to avoid the explosive nature of the spread of the virus. For the rest, rely on math :)

All Articles