by Mitch Kokai
Senior Political Analyst, John Locke Foundation
There’s been a lot of armchair analysis about various models being used to predict outcomes of COVID-19. For those of us who have built spatial and statistical models, all of this discussion brings to mind George Box’s dictum, “All models are wrong, but some are useful”—or useless, as the case may be.
The problem with data-driven models, especially when data is lacking, can be easily explained. First of all, in terms of helping decision makers make quality decisions, statistical hypothesis testing and data analysis is just one tool in a large tool box.
It’s based on what we generally call reductionist theory. In short, the tool examines parts of a system (usually by estimating an average or mean) and then makes inferences to the whole system. The tool is usually quite good at testing hypotheses under carefully controlled experimental conditions. …
… For the COVID-19 models, most of the data appears to come from large population centers like New York. This means the data sample is biased, which makes the entire analysis invalid for making any inferences outside of New York or, at best, areas without similar population density.
It would be antithetical to the scientific method if such data were used to make decisions in, for example, Wyoming or rural Virginia. While these models can sometimes provide decision makers useful information, the decisions that are being made during this crisis are far too important and complex to be based on such imprecise data. …
… Considering the limitations of this tool under controlled laboratory conditions, imagine what happens within more complex systems that encompass large areas, contain millions of people, and vary with time (such as seasonal or annual changes).
Follow Carolina Journal Online’s continuing coverage of the COVID-19 pandemic here.