Total Results: 135
Nelson, Matt A.; Magnuson, Diana L.; Hacker, J. David; Sobek, Matthew; Huynh, Lap; Roberts, Evan; Ruggles, Steven
2025.
New data sources for research on the nineteenth-century United States: IPUMS full count datasets of the censuses of population 1850–1880.
Abstract
|
Full Citation
|
Google
In October 2001, IPUMS released a preliminary population database for all individuals recorded in the 1880 population census of the United States (Goeken et al. 2003). Containing 50 million person ...
Nelson, Matt A.; Magnuson, Diana L.; Hacker, J. David; Sobek, Matthew; Huynh, Lap; Roberts, Evan; Ruggles, Steven
2025.
New data sources for research on the nineteenth-century United States: IPUMS full count datasets of the censuses of population 1850–1880.
Abstract
|
Full Citation
|
Google
In October 2001, IPUMS released a preliminary population database for all individuals recorded in the 1880 population census of the United States (Goeken et al. 2003). Containing 50 million person ...
Nelson, Matt A; Magnuson, Diana L; Ruggles, Steven; Sobek, Matthew; Huynh, Lap
2024.
Working Papers Historical Context and Creation of the IPUMS Ancestry Full Count U.S. Population Census Data 1900-1930 Historical Context and Creation of the IPUMS Ancestry Full Count Population Census Data 1900-1930.
Abstract
|
Full Citation
|
Google
IPUMS recently released final versions of full count census data for the United States 1900-1930. The information contained in these files is the product of three broad work stages: historical census enumeration, digitization, and IPUMS processing. The data were produced within an evolving institutional context and subjected to subsequent processes that had important ramifications on the final product. This paper documents these histories and processes and their implications for research. Because of the datasets' sheer size and scale, the development of these files necessitated applying different methods and approaches to assess data quality and correct the data. We document cases where data quality was affected not only by choices made by the Census historically, but also by data transcription errors in the modern day. Finally, we describe our approaches to processing the data, and we note some of the implications for research these various decisions have. As with any dataset, researchers should use this resource critically for their particular research questions and consider the data creation process from respondent to digital dataset. Despite some limitations and liabilities, the IPUMS full count data provide a powerful and valuable resource to study demographic effects on a variety of health and socioeconomic questions.
Nelson, Matt A; Magnuson, Diana L; Hacker, J David; Sobek, Matthew; Huynh, Lap; Roberts, Evan; Ruggles, Steven
2024.
Working Papers IPUMS Full Count Datasets of the U.S. Censuses of Population 1850-1880.
Abstract
|
Full Citation
|
Google
IPUMS has finalized databases for each of the United States population censuses from 1850 to 1880. These data are the result of collaborations between FamilySearch and Ancestry.com, which provided the raw data, and IPUMS, which enhanced the data with editing, standardized coding, inter-census harmonization, and documentation. We discuss the data capture process conducted by the nineteenth-century United States Census Office, construction of the modern datasets, and variable availability. We conclude by briefly discussing the potential and limitations of these data for social science research. The public data are distributed by IPUMS and available for researchers to use free of charge.
Ruggles, Steven; Rivera Drew, Julia A; Fitch, Catherine A; Hacker, J David; Helgertz, Jonas; Nelson, Matt A; Sobek, Matthew; Warren, John Robert; Ozder, Nesile; Drew, Julia A Rivera
2024.
Working Papers The IPUMS Multigenerational Longitudinal Panel: Progress and Prospects The IPUMS Multigenerational Longitudinal Panel: Progress and Prospects.
Abstract
|
Full Citation
|
Google
The IPUMS Multigenerational Longitudinal Panel (MLP) is a longitudinal population panel that links American censuses, surveys, administrative sources, and vital records spanning the period from 1850 to the present. This article explains the rationale for IPUMS MLP, outlines the design of the infrastructure, and describes the linking methods used to construct the panel. We then detail our plans for expansion and improvement of MLP over the next five years, including the incorporation of additional data sources, the development of a "linkage hub" to connect MLP with other major record linkage efforts, and the refinement of our technology and dissemination efforts. We conclude by describing a few early examples of MLP-based research.
Muralidhar, Krishnamurty; Ruggles, Steven; Domingo-Ferrer, Josep; Sánchez, David
2024.
The counterfactual framework in Jarmin et al. is not a measure of disclosure risk of respondents.
Abstract
|
Full Citation
|
Google
Jarmin et al. (1) suggest assessing disclosure risk by using a counterfactual method to compare the posterior-to-posterior probability of an inference with and without the target record. They argue that this methodology is superior to the absolute and relative risk assessment methodologies. The counterfactual method, as originally proposed in ref. 2, explicitly rejects the "with and without" target record comparison. Since the counterfactual formulation in ref. 1 uses this inappropriate comparison, their methodology is inextricably linked to differential privacy (DP) but without rigorous formalization, making it impossible to establish a privacy guarantee or prove it satisfies the desiderata. What is clear is that, unlike the other two methodologies being considered in ref. 1, the counterfactual method does not measure risk to individual respondents. Rather, it assesses whether a protection algorithm satisfies a DP-like requirement. If it does, no one is at risk; if it does not, everyone is at risk. To illustrate this fallacy, consider a population where every respondent is identical and therefore protected against disclosure. The counterfactual methodology in ref. 1 would deem every respondent to be at risk (whereas the preferred option in ref. 2 would not). Having already adopted this measure for the controversial 2020 US Decennial Census (3), Jarmin et al. are attempting to impose a questionable standard by diktat. The appendix in ref. 1 criticizes four studies based on methodological issues. We now address this criticism. (1) Ruggles and Van Riper (4) used a simple Monte Carlo simulation to estimate a baseline for evaluating the effectiveness of the Census Bureau's database reconstruction experiment. Jarmin et al. argue that the simulation is invalid because the Census experiment included a previously undocumented rule that "a record in the reconstructed data can be assigned to at most one record in the confidential data." However, it does not make much difference; if the simulation is modified to use each record no more than once, it remains the case that most of the reconstructed individuals have no match in the real population, and most of the matches that do occur would be expected purely by chance. (2) Muralidhar (5) was trying to show that the reconstruction approach used by the Census was unnecessarily complicated. Criticizing him for using a simpler schema than the one used by the Census is missing the very point of the paper. (3) Criticizing Francis (6) for not accurately predicting non-modal race/ethnicity is also missing the point. The very purpose of his paper was to show that race/ethnicity for individuals can be predicted accurately with knowledge only of the modal block value. (4) The criticism of Muralidhar and Domingo-Ferrer (7) rests on the false claim that they assume that suppression methods were used in the Summary File 1 in 2010 tabular data release. Nowhere in their paper do they make that assumption. In summary, at no point does ref. 1 refute the key conclusion of these studies that in the 2010 Census reconstruction a) most of the matches were random and b) that the reconstruction is primarily due to generalizable inference rather than privacy-violating inference. ACKNOWLEDGMENTS.
Ruggles, Steven; Magnuson, Diana L.
2023.
"It's None of Their Damn Business": Privacy and Disclosure Control in the U.S. Census, 1790-2020.
Abstract
|
Full Citation
|
Google
The U.S. Census has grappled with public concerns about privacy since the first enumeration in 1790. Beginning in the mid-nineteenth century, census officials began responding to concerns about privacy with promises of confidentiality. In recent years, escalating concerns about confidentiality have threatened to reduce the usability of publicly accessible population data. This paper traces the history of privacy and disclosure control since 1790. We argue that controlling public access to census information has never been an effective response to public concerns about government intrusion. We conclude that the Census Bureau should weigh the costs of curtailing access to reliable data against realistic measures of the benefit of new approaches to disclosure control.
Ruggles, Steven
2023.
Collaborations Between IPUMS and Genealogical Organizations, 1999-2022.
Abstract
|
Full Citation
|
Google
From 1999 to 2019, IPUMS collaborated with genealogical organizations to develop massive individual-level census datasets spanning the 1790 through 1940 period, and we are currently working on the 1950 census. This research note describes how our genealogical collaborations came about. We focus on our collaborations with the Church of Jesus Christ of Latter-Day Saints Family and Church History Department (later known as FamilySearch) and the private genealogical companies HeritageQuest and Ancestry.com.
Asquith, Brian; Hershbein, Brad; Kugler, Tracy; Reed, Shane; Ruggles, Steven; Schroeder, Jonathan; Yesiltepe, Steve; Riper, David Van
2022.
Assessing the Impact of Differential Privacy on Measures of Population and Racial Residential Segregation.
Abstract
|
Full Citation
|
Google
The U.S. Census Bureau plans to use a new disclosure avoidance technique based on differential privacy to protect respondent confidentiality for the 2020 Decennial Census of Population and Housing. Their new technique injects noise based on a number of parameters into published statistics. While the noise injection does protect respondent confidentiality, it achieves the protection at the cost of less accurate data. To better understand the impact that differential privacy has on accuracy, we compare data from the complete-count 1940 Census with multiple differentially private versions of the same data set. We examine the absolute and relative accuracy of population counts in total and by race for multiple geographic levels, and we compare commonly used measures of residential segregation computed from these data sets. We find that accuracy varies by the global privacy-loss budget and the allocation of the privacy-loss budget to geographic levels (e.g., states, counties, enumeration district) and queries. For measures of segregation, we observe situations where the differentially private data indicate less segregation than the original data and situations where the differentially private data indicate more segregation than the original data. The sensitivity of accuracy to the overall global privacy-loss budget and its allocation highlight the fundamental importance of these policy decisions. Data producers like the U.S. Census Bureau must collaborate with users not only to determine the most useful set of parameters to receive allocations of the privacy-loss budget, but also to provide documentation and tools for users to gauge the reliability and validity of statistics from publicly released data products. If they do not, producers may create statistics that are unusable or misleading for the wide variety of use cases that rely on those statistics.
Ruggles, Steven J; Van Riper Ma, David
2021.
The Role of Chance in the Census Bureau Database Reconstruction Experiment.
Abstract
|
Full Citation
|
Google
The Census Bureau plans a new approach to disclosure control for the 2020 census that will add noise to every statistic the agency produces for places below the state level. The Bureau argues the new approach is needed because the confidentiality of census responses is threatened by "database reconstruction," a technique for inferring individual-level responses from tabular data. The Census Bureau constructed hypothetical individual-level census responses from public 2010 tabular data and matched them to internal census records and to outside sources. The Census Bureau did not compare these results to a null model to demonstrate that their success in matching would not be expected by chance. This is analogous to conducting a clinical trial without a control group. We implement a simple simulation to assess how many matches would be expected by chance. We demonstrate that most matches reported by the Census Bureau experiment would be expected randomly. To extend the metaphor of the clinical trial, the treatment and the placebo produced similar outcomes. The database reconstruction experiment therefore fails to demonstrate a credible threat to confidentiality. 2
Helgertz, Jonas; Price, Joseph R; Wellington, Jacob; Thompson, Kelly; Ruggles, Steven J; Fitch, Catherine A; Sobek, Matthew; Hacker, David J; Roberts, Evan W; Warren, John Robert; Nelson, Matt; Boustan, Leah; Abramitzky, Ran; Feigenbaum, James J
2020.
Working Papers A New Strategy for Linking Historical Censuses: A Case Study for the IPUMS Multigenerational Longitudinal Panel.
Abstract
|
Full Citation
|
Google
This paper presents a new probabilistic method of record linkage, developed using the U.S. full count censuses of 1900 and 1910 but applicable to a range of different sources of historical records. The method was designed to exploit a more comprehensive set of individual and contextual characteristics present in historical census data, aiming to obtain a machine learning algorithm that better distinguishes between multiple potential matches. Our results demonstrate that the method achieves a match rate that is twice as high other currently popular methods in the literature while at the same time also achieving greater accuracy. In addition, the method only performs negligibly worse than other algorithms in resembling the target population.
Ruggles, Steven J; Magnuson, Diana L
2020.
Census Technology, Politics, and Institutional Change, 1790–2020.
Abstract
|
Full Citation
|
Google
A census is a political construct that reflects the ideological orientation of its creators. Legislators, intellectuals, and the public have contested the content and purposes of the U.S. census for 230 years. In each period, the meaning and uses of the census reflected the politics and priorities of the moment. In the 1850s, census planners suppressed information about slavery at the behest of southern legislators; in the 1880s, the census director promoted nativist theories of race suicide; and in the 1940s, census officials helped plan Japanese internment. The census is inherently political: its original purpose was reapportionment of political representation, and in virtually every decade, winners and losers of the demographic contest have debated the legitimacy of the results. In one case-the census of 1920-the results were ignored altogether and no reapportionment took place, as rural legislators feared losing power to the cities. 1 Political considerations shaped not only the content and applications of the census but also the mechanics of census taking. This essay traces the history of U.S. census data capture and processing, which we define as the methods and technologies used to transform raw census responses into statistical tables. By focusing on federal responses to specific technical challenges over a very long span, our narrative illuminates the long-run effects of shifting societal preoccupations on bureaucratic decision making. More broadly, the case study of the census reveals the critical and shifting role of state and political forces in the development of technology.
Van Riper Ma, David; Kugler, Tracy A; Ruggles, Steven J
2020.
Disclosure Avoidance in the Census Bureau’s 2010 Demonstration Data Product.
Abstract
|
Full Citation
|
Google
Ruggles, Steven J; Fitch, Catherine A; Magnuson, Diana L; Schroeder, Jonathan P
2019.
Differential Privacy and Census Data: Implications for Social and Economic Research †.
Abstract
|
Full Citation
|
Google
Ruggles, Steven J; Magnuson, Diana L
2019.
The History of Quantification in History: The JIH as a Case Study.
Abstract
|
Full Citation
|
Google
The use of quantitative methods in leading historical journals increased dramatically in the 1960s and declined sharply after the mid-1980s. The JIH is an invaluable source for analysis of the boom and bust in the use of quantitative methods in history; the journal remained under the same editors for almost fifty years and made no attempt to change editorial policies during that period. Shifting patterns of content and authorship in the JIH from the 1980s to the early 2000s reveal how the journal responded to a dramatic decline in quantitative submissions by U.S.-based historians. Recent years have seen a revival of quantification both in the JIH and in mainstream historical journals, especially among historians located at institutions outside the United States.
Ruggles, Steven J; Fitch, Catherine A; Roberts, Evan W
2018.
Historical Census Record Linkage.
Abstract
|
Full Citation
|
Google
<p>For the past 80 years, social scientists have been linking historical censuses across time to study economic and geographic mobility. In recent decades, the quantity of historical census record linkage has exploded, owing largely to the advent of new machine-readable data created by genealogical organizations. Investigators are examining economic and geographic mobility across multiple generations and also engaging many new topics. Several analysts are exploring the effects of early-life socioeconomic conditions, environmental exposures, or natural disasters on family, health, and economic outcomes in later life. Other studies exploit natural experiments to gauge the impact of policy interventions such as social welfare programs and educational reforms. The new data sources have led to a proliferation of record linkage methodologies, and some widespread approaches inadvertently introduce errors that can lead to false inferences. A new generation of large-scale shared data infrastructure now in preparation will ameliorate weaknesses of current linkage methods.</p>
Ruggles, Steven J; McCaa, Robert; Sobek, Matthew; Cleveland, Lara L
2015.
The IPUMS Collaboration: Integrating and Disseminating the World's Population Microdata.
Abstract
|
Full Citation
|
Google
The Integrated Public Use Microdata Series (IPUMS)-International partnership is a project of the Minnesota Population Center and national statistical agencies, dedicated to collecting and distributing census data from around the world. IPUMS is currently disseminating data on over a half-billion persons enumerated in more than 250 census samples from 79 countries. The data series includes information on a broad range of population characteristics, including fertility, nuptiality, life-course transitions, migration, labor-force participation, occupational structure, education, ethnicity, and household composition. This paper describes sample characteristics and data structure; the data integration process including the creation of constructed family interrelationship variables; the flexible dissemination system that enables researchers to build customized extracts of pooled census samples across time and place; and some of the most significant findings that have emerged from the database.
Ruggles, Steven J
2015.
Patriarchy, Power, and Pay: The Transformation of American Families, 1800–2015.
Abstract
|
Full Citation
|
Google
This article proposes explanations for the transformation of American families over the past two centuries. I describe the impact on families of the rise of male wage labor beginning in the nineteenth century and the rise of female wage labor in the twentieth century. I then examine the effects of decline in wage labor opportunities for young men and women during the past four decades. I present new estimates of a precipitous decline in the relative income of young men and assess its implications for the decline for marriage. Finally, I discuss explanations for the deterioration of economic opportunity and speculate on the impact of technological change on the future of work and families.
McCaa, Robert; Cleveland, Lara L; Kelly Hall, Patricia; Ruggles, Steven J; Sobek, Matthew
2015.
Statistical coherence of primary schooling in IPUMS-International integrated population samples for China, India, Vietnam, and ten other Asia-Pacific countries..
Abstract
|
Full Citation
|
Google
IPUMS-International www.ipums.org/international disseminates harmonized census microdata for more than 80 countries at no cost, although access is restricted to bona-fide researchers and students who agree to the stringent conditions of use license. Currently over 270 samples are available, totalling more than 600 million person records. Each year 15-20 additional samples are released, as more countries cooperate with the IPUMS initiative and the integration of 2010 round census samples is completed. With so much microdata so readily available, questions of data quality naturally arise. This paper focusses on the concept of statistical coherence over time for a single concept, primary schooling completed. From an analysis of the percentage completing primary schooling by birth year for pairs of samples for thirteen Asia-Pacific countries, we find outstanding coherence for four-China, Mongolia, Vietnam, and Indonesia-with mean differences of less than 0.5 percentage points, regression coefficient (b) ranging from 0.93 to 1.07 and R2 =.99. For the thirteen countries as a group there is considerable variation overall with mean absolute difference as high as 16 percentage points, b ranging from 0.62-1.44 and R2=.65-.99. As a whole, statistical coherence of primary schooling is outstanding. Nonetheless, to make expert use of the harmonized microdata, researchers are cautioned to carefully study the IPUMS integrated metadata as well as the original source documentation. National Statistical Offices not currently cooperating or that have not yet entrusted 2010 round census microdata are invited to do so.
Total Results: 135