As the world comes to grips with a pandemic, how should we look at the data?

COVID-19 data has a lot of noise. There are many variations on how the data is reported across jurisdictions with respect to the number of cases, deaths and tests. For instance, if a patient is tested twice because the first was a false negative, are both tests reported? And, if so, how?

There are many efforts dedicated to aggregating and cleaning as much of the data as possible, but many data sets are still being reported in absolutes across jurisdictions that have the same name (counties for instance) but wildly different populations.

One way to sort through the noise is to look at the data across similar geography and by population.

Unless a geography is defined by the number of people who live there, it is impossible to compare any two places in raw numbers. It would be like comparing apples to kiwi fruit. Take zip codes for instance. Zip codes came into use in 1943 when the postal service needed to increase mail delivery efficiency as the United States went to war and were fully implemented in 1963. But zip codes are not defined by population, they are defined by location.

There are 10 zip codes in the U.S. with fewer than 10 people, while the most populous one is in Texas and has 114,905 people. This means that a jump from two to four cases in 81227 which is Garfield, CO and has only six people is highly significant. This same jump has a different meaning when the zip code has nearly 115,000 people.

And while New York City has been in the news for its outbreak of COVID-19 cases, it is also the most populous city in the United States. When you look at case rates across the country, places like Southwest Georgia and some large prisons actually exceed that of New York City.

This is why rates matter.

Metopio is updating COVID-19 data as frequently as possible. Every time the data is updated, we also update the case rate using the most recent population data available.

As with any epidemiological data, we add a margin of error to the case rate to reflect uncertainty both in the number of cases and in the population of the locality. Once you have the case rates, you are able to compare places with a margin of error to understand impact. The case rate gives you an apples to apples comparison.


Using Chicago as an example, we’ll look at 60629 is the third most populous zip code in the country with 114,129 people. Zip code 60633 is in Chicago’s South Deering neighborhood and has only 12,871 people. The side-by-side maps below show you the difference between comparing a raw number of cases in those two zip codes to the case rate.


Now using the case rate, regardless of the population in the zip code, you can compare places with more accuracy. And as the data gets richer with more demographic details, it can be further stratified so policymakers and healthcare providers can target interventions. Raw numbers play a critical role, but by converting them to rates we are able to understand the scope of the pandemic.

Check back as we continue to update our curated data and dig deeper to provide you with these valuable insights.

Do you have data that would enhance this analysis?

Metopio aggregates high-quality, verified data that has some geographic component – think address or any part thereof – to develop insights that inform your business and policy decisions.

Contact us for more information