June 22 [article] -- COVID-19 Pandemic Leads to Innovative Census Bureau Methods of Estimating U.S. Population
The Census Bureau’s recently released 2020 Census Demographic and Housing Characteristics File (DHC) provided a more comprehensive picture of the U.S. population on April 1, 2020.
It also helped gauge the impact of the so-called blended base method used to create the estimates base, or starting point, for our annual time series of population estimates.
Today’s release of Vintage 2022 estimates by age, sex, race, and Hispanic origin provides the opportunity to make the most detailed comparisons between the April 1, 2020, blended base and the DHC. And there are some notable differences, especially in the results by race categories.
To determine the strengths and limitations of both data sources, it’s important to understand the basics of the blended base method.
The 2020 Census, which set out to count the total U.S. resident population as of April 1, 2020, coincided with the onset of the COVID-19 pandemic. As a result, the once-a-decade operation faced numerous unforeseen challenges: The enumeration was delayed and data collection methods, such as in-person interviews, were complicated by social distancing and stay-at-home orders.
It became clear these delays meant the Population Estimates Program would not receive all the necessary data in time to create the estimates base for and publish last year’s Vintage 2021 population estimates by the legislative deadline.
Yet instead of viewing these challenges as obstacles, the Census Bureau approached them as a unique opportunity to reimagine — and even improve — the estimates base.
If developed during a typical year, the Vintage 2021 estimates base for April 1, 2020, would have been drawn entirely from the 2020 Census.
From there, to produce an estimate of the population as of July 1 for each year included in the time series [PDF], we then add births, subtract deaths and factor in migration. Vintage is the word used to reference a time series of data created from the same base with a consistent methodology.
However, 2020 Census data were not available at the level of demographic detail necessary to develop the base for the Vintage 2021 estimates. So Census Bureau demographers came up with a creative solution that not only enabled us to produce the estimates but also introduced new possibilities for further methodological developments.
This solution, which has come to be known as the “blended base,” integrates data from three sources to create a high-quality estimate of the U.S. population on April 1, 2020, by leveraging advantages from each source. For the Vintage 2022 estimates, this consisted of:
-- 2020 Census total population counts for households and group quarters (GQs) by major facility type, featuring confidentiality protections applied using the 2020 Census disclosure avoidance system.
-- 2020 Demographic Analysis (DA) estimates, which created an April 1, 2020, population based on administrative records and were used to gauge the quality and accuracy of the census. This file provided the national age and sex distribution.
-- The Vintage 2020 population estimates for April 1, 2020, which used the 2010 Census as their estimates base. These data filled in the remaining detail, including race and Hispanic origin at all levels of geography and age and sex distributions at the state and county levels.
It is this way of combining the sources that results in differences between the blended base and the DHC. The reason: age, sex, race and Hispanic origin detail from the 2020 Census are not yet included in the estimates base.
These differences are especially notable for race because in addition to drawing race detail from the Vintage 2020 estimates, the blended base has different race categories than the DHC.
Specifically, the census includes “Some Other Race” while the population estimates do not. As a result, we used a bridging system known as “modified race” to distribute this group across the categories used in the estimates (determined by the Office of Management and Budget).
Research to develop a file featuring the modified race variable is currently underway but, until it is available, drawing inferences based on comparisons by race is not recommended.
Some of the differences between the blended base and DHC are considered beneficial innovations to the estimates methodology. The blended base’s use of the national age and sex distribution from 2020 DA was especially groundbreaking.
Built in large part from highly comprehensive and reliable vital records, these data were distinctly free of some of the shortcomings of using a census base and carrying those issues across the decade with our cohort-based estimates methodology, which involves aging the population forward with each year.
In other words, the cohort of 0-to-4-year-old children undercounted in the 2010 Census was aged forward across the decade to become the 10-to-14-year-old children continuing to exhibit the impact of this undercount in the Vintage 2020 estimates for April 1, 2020.
Using the DA age distribution helped mitigate the well-known, persistent undercount of young children, evident in both the 2020 and 2010 Censuses (and carried forward in the Vintage 2020 estimates, as noted above).
This type of adjustment was not possible using the previous decennial census base, making this an unprecedented enhancement to the estimates base.
Reducing this undercount of young children also led to a shift in the Hispanic population, another historically undercounted population in the decennial census. Specifically, using the DA age distribution increased the Hispanic population 17 years and under compared to what it would have been in the base if we had only used the 2020 Census or Vintage 2020 estimates.
Finally, using DA’s age distribution allowed us to circumvent the age heaping present in the 2020 Census. Age heaping is a phenomenon that occurs when population counts for ages ending in preferred digits such as 0 and 5 (e.g., 20, 25, 30) are higher than would be expected based on known birth, death and migration patterns. This pattern often appears as a result of proxy interviews, rounding or guessing age, and reporting an age as opposed to a birth date.
As a result, comparisons between the April 1, 2020, blended base and the 2020 DHC reveal that the blended base has a more demographically reasonable national age structure than the 2020 Census due to higher accuracy of age reporting in administrative records than in census or survey responses, which can be susceptible to rounding, guessing or even misstatement of age.
The blended base innovations prompted the Census Bureau to create the Base Evaluation and Research Team (BERT), consisting of internal subject matter experts focused on exploring ways to further improve the estimates base.
BERT’s research will inform decisions about what 2020 Census data are used in the blended base and whether adjustments to the census data used in the base could result in methodologically sound and demographically reasonable population estimates.
Until these research findings are available, the estimates base will not incorporate age, sex, race or Hispanic origin detail from the 2020 Census.
Continuing to improve the base will remain a high priority in coming years. In particular, BERT will determine whether we can use 2020 DA, the 2020 Post-Enumeration Survey and other high-quality data to develop the estimates in a way that improves their quality and produces valid and equitable results.
The Census Bureau plans to begin incorporating BERT recommendations potentially as early as the Vintage 2023 estimates base. As progress is made, we will continue to engage stakeholders in this work and keep the public informed.
Point of contact: Christine Hartley, Assistant Division Chief, Population Estimates & Projections christine.hartley@census.gov
https://www.census.gov/library/stories/2023/06/blended-base-methodology.html