It took me about 30 minutes to notice a spectacular correlation between two core metrics related to the virus, allowing me to make better predictions about the evolution of this pandemic in USA, and to provide possibly the best advice on how to reduce your risk of exposure, or at least how to buy some time in the war against this virus.
First, many metrics are useless, you need to pick up the most reliable ones. For instance, the number of people who officially tested positive is meaningless: for each one tested positive, at least 10 were positive at some point, but not sick enough to get tested or require medical treatment, and are thus unaccounted for. See my previous article here for details. Based on that metric alone (death rate if testing positive), it would put the projected deaths in US well above 3 million. This is because we have had 5.6 million people officially tested positive and 174,000 deaths so far, that is a 3% death rate (see here). While the number of people who were once positive is grossly underestimated, the number of deaths is not.
My projection is below 600,000, and I will explain shortly how I came up with that much lower upper bound. A much more reliable statistics is looking at death rate per 100,000 inhabitants, per state, from lowest to highest. If you combine that data with the population density, also broken down per state, you will find a remarkable, high correlation. The data sources that are used are as follows:
Let us denote the death rate as R, and the population density as D. The correlation between log(R) and log(D) is 0.75. The figure below illustrates how the two variables are related:
Below is the full table, broken down by state:
Of course, even the number of deaths is not a perfect metric. Death attribution may vary from state to state. Also demographics and lock-down rules also have a big impact. Yet despite all the noise in the data, a strong pattern emerges: states with lower population density fare better on average, at least for now. So, perhaps even better than wearing a mask or social-distancing, moving from a high population density area to a low one, may be be the safest thing to do. Maybe this remains true even within a same state.
The highest death rate per 100,000 inhabitants being below 200, and found in states that are now significantly improving (New York), it is reasonable to assume that in a worst case scenario, all states will reach that threshold over time, resulting in 600,000 deaths in US.
Numbers highlighted in red in the column “deaths per 100,000 (logarithm)” are alarming. It represents states that haven’t reached the full impact yet by a long stretch (given their density) and could significantly worsen: Ohio, California, Virginia, North Carolina, Tennessee, Wisconsin, West Virginia, Vermont, Maine, Oregon. To the contrary, numbers highlighted in red in the column “deaths in the last 7 days” but not in the column “deaths per 100,000 (logarithm)” appear alarming but are in fact somewhat reassuring. It represents states that are getting closer to winning the war: Florida, Georgia, Texas.
Credit: Data Science Central By: Vincent Granville