Why is it so damn difficult to build a good COVID-19 distribution model?



And here we are, during the pandemic, we look out from our windows, like aquarium fish. Everyone thinks of one thing: how badly will it end? And immediately the second thought: seriously, how much longer should I live in such a cramped space?

We all need answers. Given the amount of research and data collected about the new coronavirus, it seems that the answers just have to appear.

And there really are answers. The problem is that there is a tear in them . For example, the US Centers for Disease Control and Prevention use models that, judging by the predictions of which, in the best case, 200,000 Americans die from the virus. Meanwhile, a report from Imperial College London hit the headlines with its terrible scenario, according to which 2.2 million Americans will die if no one changes their daily behavior.

UFO Care Minute


The pandemic COVID-19, a potentially severe acute respiratory infection caused by the SARS-CoV-2 coronavirus (2019-nCoV), has officially been announced in the world. There is a lot of information on Habré on this topic - always remember that it can be both reliable / useful, and vice versa.

We urge you to be critical of any published information.


Official sources

, .

Wash your hands, take care of your loved ones, stay at home whenever possible and work remotely.

Read publications about: coronavirus | remote work

This is, to put it mildly, a fucking-up scatter - roughly the same as between the number of people dying from injuries and violence annually and the number of people dying when the Chinese Communists suppressed the counter-revolutionary uprising from 1950 to 1953 [the author, apparently, got the Korean war with the Chinese civil war / approx. transl.]. In other words, the difference between everyday life and events that will change it forever.

So where does such a wide gap come from? Such, my dears, is the nature of modeling this beast. Using a mathematical model to predict the future is a useful tool for experts, even when there is a gap between the possible results. However, it is not always easy to understand the results and how theychange over time , and this confusion can harm both your mind and your feelings. Therefore, we need to talk about what is included in the pandemic model. Perhaps understanding the uncertainty will help you sort out all of these numbers.

Imagine a simple mathematical model predicting the result of the spread of coronavirus. It is quite simple to construct - this is the kind of thing our employees do during teleconferences. The number of people who die from the virus is a function of the number of people who can be infected, the speed of its spread, and the percentage of people that the virus can kill. That is, in other (mathematical) words:





N() = N( ) * _ * _

Pretty simple. Until you try to fill in the missing data. Then it turns out that no place can be put a specific figure. Each value depends on different choices and lack of knowledge. And if each element of the model fluctuates, then the whole model will have the same problems in order to stand steady, like a journalist writing about data after too long a teleconference during self-isolation.

Consider such a basic thing as data entry. Different countries and regions collect data in many ways. There is no single spreadsheet that would be filled in all at once, and which would allow us to easily compare the number of diseases and deaths in the world. Even in the U.S., doctors say the number of deaths from COVID-19 is underestimated.

The same inconsistencies apply to virus tests. Some countries test everyone who wants to. In others, no . This affects our knowledge about how many people really got COVID-19, and how many people have found it.

In addition, the virus itself acts unpredictably, harming some groups more than others - local demography and public health will very much determine the outcome of the impact of the virus on the community.

“We, the people involved in health care, sometimes work for lack of information, trying to make the best estimates based on very incomplete information,” said Bill Miller, professor of epidemiology at Ohio State University.

Mortality rate




Some people die from COVID-19. And this, probably, will be our last unconditional statement. However, “some” is not a number, and you cannot build mathematics on it.

The problem is that calculating the percentage of deaths from the virus from the very beginning is inaccurate. In different groups, it can be very different. "Age is a very important factor, so we have to recount deaths taking into account the demographic composition of the United States and the presence of chronic diseases," said Ray Wannier, a biostatist at the University of California at San Francisco. Chronic diseases can exacerbate the effects of COVID-19.

In other words, there is no single mortality rate - there are many of them. US mortality rate will varyfrom the mortality rate in a country where, say, fewer patients with diabetes. The same can be said about the coefficients in the United States - if the virus spreads in the city with the suburbs where the elderly live, the mortality rate calculated there will be higher than if the distribution center were in the city with the young population.

But let's turn to international statistics. Will the mortality rate from COVID-19 in China or Italy allow us to estimate the mortality rate in the USA? Certainly, this information will be useful - but it will only reduce uncertainty, and will not give complete certainty.

Of course, we still do not know the exact mortality rates in those regions. For many reasons, starting with a set of basic case data. Numbers are not facts. This is the result of many subjective conclusions, which must first be written in detail and transparently, and then begin to be regarded as a fact. It affects how the data is collected, and whether the process of collecting them changes from time to time.

There is also the problem of uncollected or inaccurate data. To determine the mortality rate, you need to divide the number of people who died from the disease by the number of cases. But we do not have exact numbers for sick people - mathematically speaking, we do not know the denominator. And frankly, the first number, the numerator, is also not exactly known to us - however, we assume that it is close to reality.


« » COVID-19. - , , .

In an ideal world, we would check all people for signs of infection with a new coronavirus, to know exactly how many people have it and how many died because of it. However, we came to this situation in just a couple of cases. Take, for example, The Diamond Princess, one of the cruise ships quarantined after the outbreak of COVID-19. Almost all passengers passed the tests (3063 tests for 3711 people). The “Diamond Princess” has become a living laboratory, with data collection conditions that usually do not add up in the real world. Researchers were able to not only find out how many people were sick, but how many had no symptoms - and therefore how many people would not be tested, would not be diagnosed, and would not be taken into account if they were on land.

The results of this unusual experiment indicate the existence of a large number of people who carry the virus and do not know about it - and, therefore, that the mortality rate is actually lower than follows from the data. Among the population of the “Diamond Princess”, the mortality rate for people with a diagnosis and symptoms was 2.3%, but if you take into account all the diagnoses, even those who did not have symptoms, then the coefficient will be 1.2% . In Iceland, on March 13, deCODE Genetics began offering free testing for everyone, even people without symptoms. On March 29, deCODE detected 71 infected people in 8694 tests, including those without symptoms.

Meanwhile, the symptom ratio - the number of people with symptoms relative to the number of people without them - is also of great importance, but at the same time we can only guess at it. A report from Imperial College London suggests that two-thirds of the cases are symptomatic enough for an infected person to feel and self-isolate. In the data from the "Diamond Princess" it was found that at the time of diagnosis, symptoms showed up in half of the people . The actual symptom ratio affects the calculation of the mortality rate.

However, the data from the “Diamond Princess” is also imperfect - they didn’t check everyone, the demographic section of the passengers of the cruise ship is not representative of a wider population, and some of the patients can still die, which will increase the mortality rate. However, no more realistic data can be found on land. Data from Iceland are not published with the same methodological details. In the United States, large-scale testing is only just beginning. If only sick people are tested, as is done in most states, the mortality rate will not reflect the actual behavior of the virus - the denominator problem again raises its head. In addition, testing in the US faces additional challenges - lack of tests and the fact that some private laboratories do not publish the number of negative results.

The true mortality rate is also affected by our ability to prevent a sick person from dying. And it depends on the capabilities of the hospitals. With unlimited access to intensive care beds and mechanical ventilation, many people with serious symptoms could survive the infection. But in the USA there are not enough resources, and if demand exceeds supply - as is already happening in some parts of the country - then people who would survive when accessing the ventilator will die. This can lead to a domino effect. People who need emergency care not related to the virus will also suffer from lack of resources in hospitals, and their deaths, not even related to COVID-19, will add to the overall mortality statistics, although they could have been prevented, and although they should not be included statistics on COVID-19.

“Mortality will be greatly affected by whether we run into a shortage of supplies and personnel, and it is not yet clear how flexible our healthcare system will be,” said Wagnier.

And there is also an infection rate




Almost everything that we talked about the mortality rate is also applicable to the infection rate: all estimates depend on data collection, sampling, and symptomatic rate. But to find out the infection rate, you still need to understand how often the virus is transmitted from one person to another. You may have heard such a term as the basic reproductive number (abbreviated as R 0 ) - this is the average number of secondary infections that occur after one infected individual is in a population consisting of individuals that are completely sensitive to this disease.

Here's the thing: the transmission of the virus will surely fluctuate tremendously, and depend on various characteristics of social behavior, details of the local environment and political decisions. In different countries, all this will be different. And even in different states of the USA. Also, these parameters will change over time depending on the measures we take to combat the virus. In malaria , for example, R 0 is greater in places where there is a lot of stagnant water.

Because of this, modeling potential distribution results of COVID-19 must include many different virus transmission scenarios. And they will not be exact; it will be a certain range of ratings. In these scenarios, several estimates are taken into account, each of which in turn can also change (seriously, this is just an endless regression).

The first variable is the contact coefficient - in fact, how many people the infected person interacts with over a certain period of time. Only this parameter is subject to people, and that is why everyone is locked up and talking about social distance. The average contact coefficient is heterogeneous - it varies from person to person, depending on factors such as the situation with the habitat and work, and also varies depending on how the health care system responds and where everything happens. “Imagine the difference between the highlands of a rural state and the business district of a big city,” Miller said.

Then comes the gear ratio. This is a way to imagine the number of people who become infected by meeting an infected person. This is also a moving target. Viruses do not spread according to a uniform pattern such as “two new cases per person”. The process goes in irregular jumps, like a crowd of suburban residents who pounced on shelves with toilet paper. Sam Scarpino, a professor at Northeastern University who models infectious diseases, calls this “super-proliferation events,” situations where a factor that is usually more dependent on the site of action than on people suddenly increases the number of cases. Recall the Biogen conference, which at some point was responsiblefor 77 out of 95 cases diagnosed in Massachusetts. Or a woman who single-handedly broke an effective containment strategy in South Korea.

Remember the symptom ratio? Some suggest that carriers with symptoms infect fewer people than those without symptoms, so this ratio also affects the transmission rate.

Virology also matters when recounting the number of transfers to contacts. Here you need to consider how long the virus can survive on the surface (and on what surfaces it appears), and how far it can fly through the air. With the new COVID-19, there are different ratings for both factors.. There is still a difference between bodies and human behavior. For example, smokers may be more at risk for infection and complications. And although this is largely due to the effect of smoking on the lungs and what the virus does inside the body, it also affects the fact that smokers often put their hands to their mouths , increasing the risk of transmission.

Finally, there is the duration of contagion - how long can a person spread the virus , and during what period of the development of the disease is it contagious ? It depends on the biology of the virus and individual immune systems, said Mark Weir, director of the Ohio State University Environmental, Epidemiology and Health Program.

All these parameters are used to estimate R 0, base reproductive number.

And if the basic reproductive number implies the vulnerability of the entire population, then there is still an effective reproductive number, depending on what percentage of the population is vulnerable to the virus. One of the reasons for the high vulnerability of the population to the new coronavirus is that this virus is exactly what is new. Nobody had it before.

Also, a good model needs to think about such a problem as re-infection: if people who received the virus and recovered from it acquired immunity to it, then the percentage of the vulnerable population is reduced. But so far, we do not know much about immunity after infection .

And we have not even mentioned this change in vulnerability when opening a vaccine. But we already have enough details.

Mix it all into a model


To create a model, you need to collect all these variables (and others that the editor did not allow us to talk about), take into account their uncertainty, joint correlation, and many other things. It may turn out to be a rather complicated thing.

And all these factors can be influenced by all attempts to interfere with the spread of the virus - social distance, washing hands, closing schools, reducing the number of non-urgent surgical operations, etc. This is a big unknown, capable of radically changing the shape of the outbreak - and it also varies depending on the country, state and even city.



It's like baking a pie. With a normal recipe, it can be done quite simply and expect a meaningful and predictable result. But if the recipe contains instructions like “add from three to 15 apples, or steaks, or slices of Brussels sprouts, depending on what you have at hand” ... this will definitely affect the taste of the pie, right? You can make assumptions about the correctness of the ingredients and their quantity. But these are only assumptions, not exact facts. And if you make too many assumptions when cooking, you may not get what you wanted to do. And you do not necessarily know that you were mistaken.

Over the coming months, you will come across many different predictions regarding the outcome of the COVID-19 pandemic. Not all of them will be the same. But just because they are based on assumptions does not mean that they are useless.

“All models are wrong, we just strive to make them less untrue and useful today,” Weir said.

We want to eat, so someone will have to do the cooking. Be sure to ask from what ingredients and from what quantity this cake was made.

All Articles