Incidence of COVID-19 in Texas after adjusting for test positivity
At a glance:
Even when you adjust for test-positivity rates, the number of new COVID-19 cases per day in Texas is going up, although not as rapidly as the unadjusted numbers imply.
17 May 2020
Amidst controversy in several (perhaps many?) countries about the timing and pace of opening up from COVID-19 control measures, one small corner of the argument today related to why Texas is seeing record numbers of new cases of COVID-19 in the days after a range of opening up measures. In a thread on Twitter, @SeanTrende argued that the worrying trend is due to the big increase in number of COVID-19 tests. The Texas authorities should not be punished when running more tests shows up more cases.
This is the ideal use case for my adjustment for test-positivity proposed in last week’s post.
Here’s a chart of the trends in COVID-19 cases in Texas, with and without being adjusted by a multiplier of the square root of the test positivity rate. The vertical scale has been removed because we don’t have a way of translating the red adjusted line into actual numbers of cases. For this chart, I’ve converted both lines to indexes that come together at the end of the period by design. A good estimate of actual absolute case numbers, allowing me to put the scale back on the vertical axis, would certainly involve the red line being shifted upwards by some additional and unknown multiplier. So let’s just focus on trends.
I’ve used a smoothed version of the test positivity rate after modelling it with a generalized additive model, to handle data problems relating to test numbers; and seven day moving averages of both series to deal with the weekly ‘seasonality’ of the data. Code is at the bottom of the post.
We can see that @SeanTrende is at least partly justified. If you adjust the confirmed cases per day this way, the latest values, while worrying, are not ‘records’ exceeding the high point in mid April.
But they are still going up, which means that COVID-19 cases do seem to be accelerating in Texas even when we take into account the higher number of tests being undertaken.
To get that red line to level out you need to use the most-maximalist version of adjustment possible and multiply the number of cases by the test positivity rate itself (rather than its square root). This would be equivalent to treating the people being tested as a random sample representative of the overall Texas population (not self-selecting for sicker people at all), which is not plausible.
Here’s a similar chart for the 12 US states with the most COVID-19 cases:
There’s some interesting patterns there. I’m pretty much satisfied the adjusted values are more accurate pictures of the incidence trends in these states than the original case numbers.
The R code for doing this is below. Comments welcome.
Here’s the Twitter thread that prompted me to write this post:
My day job is Chief Data Scientist at Nous Group, an international management consultancy with over 400 people working across Australia, the UK and Canada. Contact me if you are interested working with us on a grand challenge or broad agenda.
I'm pleased to be aggregated at R-bloggers, the one-stop shop for blog posts featuring R.