Full Citation
Title: Assessing the Impact of Differential Privacy on Measures of Population and Racial Residential Segregation
Citation Type: Journal Article
Publication Year: 2022
ISBN:
ISSN:
DOI: 10.1162/99608F92.5CD8024E
NSFID:
PMCID:
PMID:
Abstract: The U.S. Census Bureau plans to use a new disclosure avoidance technique based on differential privacy to protect respondent confidentiality for the 2020 Decennial Census of Population and Housing. Their new technique injects noise based on a number of parameters into published statistics. While the noise injection does protect respondent confidentiality, it achieves the protection at the cost of less accurate data. To better understand the impact that differential privacy has on accuracy, we compare data from the complete-count 1940 Census with multiple differentially private versions of the same data set. We examine the absolute and relative accuracy of population counts in total and by race for multiple geographic levels, and we compare commonly used measures of residential segregation computed from these data sets. We find that accuracy varies by the global privacy-loss budget and the allocation of the privacy-loss budget to geographic levels (e.g., states, counties, enumeration district) and queries. For measures of segregation, we observe situations where the differentially private data indicate less segregation than the original data and situations where the differentially private data indicate more segregation than the original data. The sensitivity of accuracy to the overall global privacy-loss budget and its allocation highlight the fundamental importance of these policy decisions. Data producers like the U.S. Census Bureau must collaborate with users not only to determine the most useful set of parameters to receive allocations of the privacy-loss budget, but also to provide documentation and tools for users to gauge the reliability and validity of statistics from publicly released data products. If they do not, producers may create statistics that are unusable or misleading for the wide variety of use cases that rely on those statistics.
Url: https://hdsr.mitpress.mit.edu/pub/1rsg867y/release/1
User Submitted?: No
Authors: Asquith, Brian; Hershbein, Brad; Kugler, Tracy; Reed, Shane; Ruggles, Steven; Schroeder, Jonathan; Yesiltepe, Steve; Riper, David Van
Periodical (Full): Harvard Data Science Review
Issue: Special Issue 2
Volume:
Pages:
Countries: