Updated: Apr 9, 2020
If we were to pick the most famous graph of 2020, our money would be on the “Flattening the Curve” graph we’ve seen in every publication for the past four weeks. Who would have thought a normal distribution would do such a compelling job of explaining why we’ve all had to learn how to video conference?
It feels like we’ve all been tasked with becoming epidemiologists, or at least data scientists, on the fly. As seasoned data scientists (and an M.D.), even we find ourselves overwhelmed with the amount of data, metrics, indicators, and models we have to keep up with just to read the news. But what is hidden in all of these analyses is the fact that we actually don’t have enough data on Covid-19. After all, it’s a completely new disease. We don’t know basic statistics about Covid-19, such as how many people have the disease, the accuracy and spread of testing, and how many people in the U.S. are likely to get Covid-19.
What do data scientists do when they don’t have enough data? They simulate it.
We’ve really seen a few data models rise to the top of public consumption. The first is the Imperial College Covid-19 Response Team Model, which was widely cited as the reason the U.S. and U.K. began their nationwide social distancing measures. However, only a few days later, it became apparent that the assumptions, or the basic data that the model used to create its predictions, were too dire. As Bill Gates said in critiquing the model, “models are only as good as the assumptions put into them.”
The New York Times picked up a recently-published Columbia University model that breaks down the infection rate by state. The Centers for Disease Control came out with a prediction that we could expect 100,000 to 200,000 deaths in the U.S. from Covid-19. The CDC’s model is not easily accessible, though their estimates match other public health models released.
Which models and numbers should we trust?
The Covid-19 models will get more accurate as we collect more data. The models that are agile enough to update their assumptions based not only on data from the U.S., but data from around the world, will be the ones that stay the most relevant. Every country in the world is collecting data that will be used to better understand the intricacies of both the spread and treatment of Covid-19. We are undergoing a massive data collection project that will help us make important decisions from how long to keep social distancing to how to better treat the disease.
By: The Sorenson Impact Center Data Team