Total Results: 58
Nelson, Matt A.; Magnuson, Diana L.; Hacker, J. David; Sobek, Matthew; Huynh, Lap; Roberts, Evan; Ruggles, Steven
2025.
New data sources for research on the nineteenth-century United States: IPUMS full count datasets of the censuses of population 1850–1880.
Abstract
|
Full Citation
|
Google
In October 2001, IPUMS released a preliminary population database for all individuals recorded in the 1880 population census of the United States (Goeken et al. 2003). Containing 50 million person ...
Nelson, Matt A.; Magnuson, Diana L.; Hacker, J. David; Sobek, Matthew; Huynh, Lap; Roberts, Evan; Ruggles, Steven
2025.
New data sources for research on the nineteenth-century United States: IPUMS full count datasets of the censuses of population 1850–1880.
Abstract
|
Full Citation
|
Google
In October 2001, IPUMS released a preliminary population database for all individuals recorded in the 1880 population census of the United States (Goeken et al. 2003). Containing 50 million person ...
Nelson, Matt A; Magnuson, Diana L; Ruggles, Steven; Sobek, Matthew; Huynh, Lap
2024.
Working Papers Historical Context and Creation of the IPUMS Ancestry Full Count U.S. Population Census Data 1900-1930 Historical Context and Creation of the IPUMS Ancestry Full Count Population Census Data 1900-1930.
Abstract
|
Full Citation
|
Google
IPUMS recently released final versions of full count census data for the United States 1900-1930. The information contained in these files is the product of three broad work stages: historical census enumeration, digitization, and IPUMS processing. The data were produced within an evolving institutional context and subjected to subsequent processes that had important ramifications on the final product. This paper documents these histories and processes and their implications for research. Because of the datasets' sheer size and scale, the development of these files necessitated applying different methods and approaches to assess data quality and correct the data. We document cases where data quality was affected not only by choices made by the Census historically, but also by data transcription errors in the modern day. Finally, we describe our approaches to processing the data, and we note some of the implications for research these various decisions have. As with any dataset, researchers should use this resource critically for their particular research questions and consider the data creation process from respondent to digital dataset. Despite some limitations and liabilities, the IPUMS full count data provide a powerful and valuable resource to study demographic effects on a variety of health and socioeconomic questions.
Nelson, Matt A; Magnuson, Diana L; Hacker, J David; Sobek, Matthew; Huynh, Lap; Roberts, Evan; Ruggles, Steven
2024.
Working Papers IPUMS Full Count Datasets of the U.S. Censuses of Population 1850-1880.
Abstract
|
Full Citation
|
Google
IPUMS has finalized databases for each of the United States population censuses from 1850 to 1880. These data are the result of collaborations between FamilySearch and Ancestry.com, which provided the raw data, and IPUMS, which enhanced the data with editing, standardized coding, inter-census harmonization, and documentation. We discuss the data capture process conducted by the nineteenth-century United States Census Office, construction of the modern datasets, and variable availability. We conclude by briefly discussing the potential and limitations of these data for social science research. The public data are distributed by IPUMS and available for researchers to use free of charge.
Hacker, J David; Huynh, Lap; Nelson, Matt A; Sobek, Matthew
2024.
Working Papers IPUMS Full Count Datasets of Slaves and Slaveholders in the United States in 1850 and 1860 IPUMS Full Count Datasets of Slaves and Slaveholders in the United States in 1850 and 1860.
Abstract
|
Full Citation
|
Google
This article describes the development of IPUMS full count datasets of the censuses of slave inhabitants of the United States in 1850 and 1860. These data are a result of two collaborations. The 1850 slave dataset stems from a collaboration between the Church of Jesus Christ of Latter-day Saints, whose volunteers transcribed the original manuscript forms, and IPUMS, which enhanced the raw data with editing, standardized coding procedures, constructed variables, and documentation. The 1860 dataset was the result of a similar collaboration between the genealogical company Ancestry and IPUMS. The article discusses the features of these datasets, their limitations, and suggests possible research uses.
Ruggles, Steven; Rivera Drew, Julia A; Fitch, Catherine A; Hacker, J David; Helgertz, Jonas; Nelson, Matt A; Sobek, Matthew; Warren, John Robert; Ozder, Nesile; Drew, Julia A Rivera
2024.
Working Papers The IPUMS Multigenerational Longitudinal Panel: Progress and Prospects The IPUMS Multigenerational Longitudinal Panel: Progress and Prospects.
Abstract
|
Full Citation
|
Google
The IPUMS Multigenerational Longitudinal Panel (MLP) is a longitudinal population panel that links American censuses, surveys, administrative sources, and vital records spanning the period from 1850 to the present. This article explains the rationale for IPUMS MLP, outlines the design of the infrastructure, and describes the linking methods used to construct the panel. We then detail our plans for expansion and improvement of MLP over the next five years, including the incorporation of additional data sources, the development of a "linkage hub" to connect MLP with other major record linkage efforts, and the refinement of our technology and dissemination efforts. We conclude by describing a few early examples of MLP-based research.
Helgertz, Jonas; Price, Joseph R; Wellington, Jacob; Thompson, Kelly; Ruggles, Steven J; Fitch, Catherine A; Sobek, Matthew; Hacker, David J; Roberts, Evan W; Warren, John Robert; Nelson, Matt; Boustan, Leah; Abramitzky, Ran; Feigenbaum, James J
2020.
Working Papers A New Strategy for Linking Historical Censuses: A Case Study for the IPUMS Multigenerational Longitudinal Panel.
Abstract
|
Full Citation
|
Google
This paper presents a new probabilistic method of record linkage, developed using the U.S. full count censuses of 1900 and 1910 but applicable to a range of different sources of historical records. The method was designed to exploit a more comprehensive set of individual and contextual characteristics present in historical census data, aiming to obtain a machine learning algorithm that better distinguishes between multiple potential matches. Our results demonstrate that the method achieves a match rate that is twice as high other currently popular methods in the literature while at the same time also achieving greater accuracy. In addition, the method only performs negligibly worse than other algorithms in resembling the target population.
Sarkar, Sula; Cleveland, Lara; Silisyene, Majory; Sobek, Matthew
2016.
Harmonized census geography and spatio-temporal analysis: Gender equality and empowerment of women in Africa.
Abstract
|
Full Citation
|
Google
Changes in administrative boundaries pose major challenges for spatio-temporal population research. Researchers interested in change over time need to hold space constant to study contextual or spatial effects on behaviors and outcomes. Boundary changes risk polluting their analyses with artifacts that obscure real changes that may have occurred. This paper describes the method by which spatially consistent geographic units have been constructed in the IPUMS-International census data collection for several countries over a fifty year period. We illustrate the utility of spatially consistent units by exploring progress toward UN Millennium Development Goals in a number of African countries at low levels of geography: specifically the goals to "promote gender equality and empower women." The analysis shows progress towards goals, but the pattern of growth differs markedly both across and within countries. We show how the use of harmonized geographic units facilitates comparative metrics.
Ruggles, Steven J; McCaa, Robert; Sobek, Matthew; Cleveland, Lara L
2015.
The IPUMS Collaboration: Integrating and Disseminating the World's Population Microdata.
Abstract
|
Full Citation
|
Google
The Integrated Public Use Microdata Series (IPUMS)-International partnership is a project of the Minnesota Population Center and national statistical agencies, dedicated to collecting and distributing census data from around the world. IPUMS is currently disseminating data on over a half-billion persons enumerated in more than 250 census samples from 79 countries. The data series includes information on a broad range of population characteristics, including fertility, nuptiality, life-course transitions, migration, labor-force participation, occupational structure, education, ethnicity, and household composition. This paper describes sample characteristics and data structure; the data integration process including the creation of constructed family interrelationship variables; the flexible dissemination system that enables researchers to build customized extracts of pooled census samples across time and place; and some of the most significant findings that have emerged from the database.
McCaa, Robert; Cleveland, Lara L; Kelly Hall, Patricia; Ruggles, Steven J; Sobek, Matthew
2015.
Statistical coherence of primary schooling in IPUMS-International integrated population samples for China, India, Vietnam, and ten other Asia-Pacific countries..
Abstract
|
Full Citation
|
Google
IPUMS-International www.ipums.org/international disseminates harmonized census microdata for more than 80 countries at no cost, although access is restricted to bona-fide researchers and students who agree to the stringent conditions of use license. Currently over 270 samples are available, totalling more than 600 million person records. Each year 15-20 additional samples are released, as more countries cooperate with the IPUMS initiative and the integration of 2010 round census samples is completed. With so much microdata so readily available, questions of data quality naturally arise. This paper focusses on the concept of statistical coherence over time for a single concept, primary schooling completed. From an analysis of the percentage completing primary schooling by birth year for pairs of samples for thirteen Asia-Pacific countries, we find outstanding coherence for four-China, Mongolia, Vietnam, and Indonesia-with mean differences of less than 0.5 percentage points, regression coefficient (b) ranging from 0.93 to 1.07 and R2 =.99. For the thirteen countries as a group there is considerable variation overall with mean absolute difference as high as 16 percentage points, b ranging from 0.62-1.44 and R2=.65-.99. As a whole, statistical coherence of primary schooling is outstanding. Nonetheless, to make expert use of the harmonized microdata, researchers are cautioned to carefully study the IPUMS integrated metadata as well as the original source documentation. National Statistical Offices not currently cooperating or that have not yet entrusted 2010 round census microdata are invited to do so.
McCaa, Robert; Hall, Patricia Kelly; Cleveland, Lara L; Ruggles, Steven J; Sobek, Matthew
2014.
The IPUMS-International partnership enhances the value of census microdata for both producers and users.
Abstract
|
Full Citation
|
Google
The IPUMS-International partnership is led by the University of Minnesota Population Center in consortium, at present, with 101 official statistical agencies that have endorsed uniform protocols for world-wide access to microdata free-of-cost (https://international.ipums.org/international/international_partners.shtml). In 2014, the partnership is celebrating its fifteenth year with the launch of 259 integrated, confidentialized samples representing 79 countries (82% of the worlds population) and totaling 561,622,889 person records. More than 9,000 researchers have made over 50,000 extracts from the IPUMS-International database and published nearly a thousand papers, reports, dissertations and books using the microdata. Research output will continue to grow as 2010 round census samples and other types of microdata (surveys on demography, health, labor force, etc.) are integrated into the IPUMS dissemination system. Through the IPUMS partnership the value of census microdata is enhanced for both producers (official statisticians) and users (researchers and policy makers) in three important ways: 1) Dissemination: one portal with a single set of rules and internet browser tools provides access to microdata for all researchers, regardless of country of residence; 2) Security: privacy and confidentiality are protected by rigorous technical, administrative and legal protocols; and 3) Usefulness: both metadata and microdata are integrated to preserve the definitions and concepts in each census to facilitate good use.
Hall, Patricia Kelly; Cleveland, Lara L; Sobek, Matthew
2014.
IPUMS International: A Data Resource for Statistics Education.
Abstract
|
Full Citation
|
Google
IPUMS-International is the world's largest collection of high-precision census data samples containing individual-level information on 544 million people in 74 countries spanning five decades. These data are available for download at no cost to educators, students and researchers for scholarly, educational, and policy-related analysis. The database, built in cooperation with national statistical offices, provides remarkable access to data for educators wishing to expose students to real-world governmental statistics. Variables with distinct census responses for each person are coded consistently across time and place; documentation is thorough, harmonized and easily accessible; and the web delivery system allows registered users to create and download customized data sets pooled across time and place. Individual level responses mean that data can be used in analyses that range from simple descriptive tables to advanced statistical modeling.
Cleveland, Lara L; Ruggles, Steven J; Sobek, Matthew; McCaa, Robert
2013.
The IPUMS big data revolution: liberating, integrating, and disseminating the globe's census microdata free of cost.
Abstract
|
Full Citation
|
Google
Fifty years ago, census microdata were available for only a handful of countries and trans-border access was difficult for all but a few. Now, from www.ipums.org, many decades of census microdata for much of the globe are readily accessible anywhere, free of cost to researchers and students - regardless of country of birth, residence, or citizenship. As of late - 2013, 238 samples representing 74 countries, totaling more than one half billion person records and encompassing more than four fifths of the world's population are available to more than 7,000 registered researchers worldwide. Pioneers of demography developed national and even regional databases for analyzing census microdata (Ruggles 2013), but it is IPUMS that developed the first integrated system for global access. The IPUMS big data revolution, foretold a decade ago (McCaa and Ruggles 2002), has arrived, but is not yet complete. 50 years hence it is likely that almost all census microdata around the globe will be integrated and accessible via multiple systems of access using application programming interfaces (API).The revolution has already sparked much new research. According to a former president of the Population Association of America, students of the Big Census Data Revolution, specifically those with analytical experience using integrated census microdata, enjoy advantages for internships and employment at the World Bank and similar agencies (Meier, Lam, and McCaa 2011). Likewise, Dot-Coms beckon as a new jobs frontier opens for savvy Big Data users (Lohr, 2012:B2).
Fitch, Catherine A; Manson, Steven M; Sobek, Matthew; Ruggles, Steven J; Foley, Johnathan
2012.
Terra Populus: A Global Population/Environment Data Network.
Abstract
|
Full Citation
|
Google
Terra Populus, part of NSFs new DataNet initiative, will develop organizational and technical infrastructure to integrate, preserve, and disseminate data describing changes in the human population and environment over time. A plethora of high-quality environmental and population datasets are available, but they are widely dispersed, have incompatible or inadequate metadata, and have incompatible geographic identifiers. The new infrastructure will enable researchers to identify and merge data from heterogeneous sources to study the relationships between human behavior and the natural world. Terra Populus will partner with data archives, data producers, and data users to create a sustainable international organization that will guarantee preservation and access over multiple decades.
McCaa, Robert; Ruggles, Steven J; Esteve Pals, Albert; Cleveland, Lara L; Sobek, Matthew
2012.
Ten Ways IPUMSInternational Adds Value to Census Microdata.
Abstract
|
Full Citation
|
Google
Many statistical offices recognize the need for enhancing access to census microdata. High costs, challenging risks, and low rewards are substantial obstacles to going-it-alone. An economical, essentially cost-free solution, endorsed by more than 90 National Statistical Offices, is offered by the IPUMS-International project. This paper discusses ten ways the project enhances access and adds value to census microdatagrouped into four categories: 1. statistical confidentiality (security, disclosure protections, managing access), 2. Integration (comprehensive source documentation, integrated metadata, integrated, pooled microdata, IPUMS-I constructed variables), 3. Dissemination (free, trans-border access, custom-tailored extracts), and 4. ethics (statistical transparency, academic freedom, reduction of risks of fraud/mis-representation, and sharing of research findings). Statistical agencies not yet participating in the IPUMS-International initiative are invited to do so. Those already participating are encouraged to entrust 2010 round census microdata to the project in a timely manner.
Sobek, Matthew; Ruggles, Steven J; Cleveland, Lara L; McCaa, Robert
2012.
When Excessive Perturbation Goes Wrong and Why IPUMS-International Relies Instead on Sampling, Suppression, Swapping, and Other Minimally Harmful Methods to Protect Privacy of Census Microdata.
Abstract
|
Full Citation
|
Google
IPUMS-International disseminates population census microdata at no cost for 69 countries. Currently, a series of 212 samples totaling almost a half billion person records are available to researchers. Registration is required for researchers to gain access to the microdata. Statistics from Google Analytics show that IPUMSInternationals lengthy, probing registration form is an effective deterrent for unqualified applicants. To protect data privacy, we rely principally on sampling, suppression of geographic detail, swapping of records across geographic boundaries, and other minimally harmful methods such as top and bottom coding. We do not use excessively perturbative methods. A recent case of perturbation gone wrong the household samples of the 2000 census of the USA (PUMS), the 20032006 American Community Survey, and the 20042009 Current Population Survey, an empirical study of the impact of perturbation on the usability of UK census microdatathe Individual SARs of the 1991 census of the UK, and a mathematical demonstration in a timely compendium of statistical confidentiality practices confirm the wisdom of IPUMS microdata management protocols and statistical disclosure controls.
Cleveland, Lara L; Kennedy, Sheela; Sobek, Matthew; McCaa, Robert
2011.
The quality of constructed family and household relationships in African Census Samples.
Abstract
|
Full Citation
|
Google
In African countries, census data provide critical information on current and historical trends in households and family relationships. We use data from IPUMS-International and the African Integrated Census Microdata Series (AICMD), a freely available database of 52 million person records from 13 African countries, from the 1980s through the 2000s. Our paper assesses the quality of the data available in each of these censuses for constructing measures of spouse and parent-child relationship, household structure, and estimates of fertility. We consider the quality of age and sex reporting, as well as missing data. We assess the quality of the links created within African censuses and compare estimates of own child fertility in South Africa with other published estimates. We show that the IPUMS pointers perform well and are especially valuablegiven the complex family and household structure found in many African countries.
Sobek, Matthew; King, Miriam L; Ruggles, Steven J; Flood, Sarah M; Cleveland, Lara L; Schroeder, Matthew B
2011.
Big Data: Large-Scale Historical Infrastructure from the Minnesota Population Center.
Abstract
|
Full Citation
|
Google
The Minnesota Population Center (MPC) provides aggregatedata and microdata that have been integrated and harmonized to maximize crosstemporal and cross-spatial comparability. All MPC data products are distributed free of charge through an interactive Web interface that enables users to limit the data and metadata being analyzed to samples and variables of interest to their research. In this article, the authors describe the integrated databases available from the MPC, report on recent additions and enhancements to these data sets, and summarize new online tools and resources that help users to analyze the data over time. They conclude with adescription of the MPCs newest and largest infrastructure project to date: a global population and environment data network.
Total Results: 58