Total Results: 10
Asquith, Brian; Hershbein, Brad; Kugler, Tracy; Reed, Shane; Ruggles, Steven; Schroeder, Jonathan; Yesiltepe, Steve; Riper, David Van
2022.
Assessing the Impact of Differential Privacy on Measures of Population and Racial Residential Segregation.
Abstract
|
Full Citation
|
Google
The U.S. Census Bureau plans to use a new disclosure avoidance technique based on differential privacy to protect respondent confidentiality for the 2020 Decennial Census of Population and Housing. Their new technique injects noise based on a number of parameters into published statistics. While the noise injection does protect respondent confidentiality, it achieves the protection at the cost of less accurate data. To better understand the impact that differential privacy has on accuracy, we compare data from the complete-count 1940 Census with multiple differentially private versions of the same data set. We examine the absolute and relative accuracy of population counts in total and by race for multiple geographic levels, and we compare commonly used measures of residential segregation computed from these data sets. We find that accuracy varies by the global privacy-loss budget and the allocation of the privacy-loss budget to geographic levels (e.g., states, counties, enumeration district) and queries. For measures of segregation, we observe situations where the differentially private data indicate less segregation than the original data and situations where the differentially private data indicate more segregation than the original data. The sensitivity of accuracy to the overall global privacy-loss budget and its allocation highlight the fundamental importance of these policy decisions. Data producers like the U.S. Census Bureau must collaborate with users not only to determine the most useful set of parameters to receive allocations of the privacy-loss budget, but also to provide documentation and tools for users to gauge the reliability and validity of statistics from publicly released data products. If they do not, producers may create statistics that are unusable or misleading for the wide variety of use cases that rely on those statistics.
Van Riper Ma, David; Kugler, Tracy A; Ruggles, Steven J
2020.
Disclosure Avoidance in the Census Bureau’s 2010 Demonstration Data Product.
Abstract
|
Full Citation
|
Google
Kugler, Tracy A; Grace, Kathryn; Wrathall, David J.; de Sherbinin, Alex; Van Riper Ma, David; Aubrecht, Christoph; Comer, Douglas; Adamo, Susana B.; Cervone, Guido; Engstrom, Ryan; Hultquist, Carolynne; Gaughan, Andrea E.; Linard, Catherine; Moran, Emilio; Stevens, Forrest; Tatem, Andrew J; Tellman, Beth; Van Den Hoek, Jamon
2019.
People and Pixels 20 years later: the current data landscape and research trends blending population and environmental data.
Abstract
|
Full Citation
|
Google
Kugler, Tracy A; Fitch, Catherine A
2018.
Interoperable and accessible census and survey data from IPUMS.
Abstract
|
Full Citation
|
Google
Interoperable and accessible census and survey data from IPUMS
Kugler, Tracy A; Manson, Steven M; Donato, Joshua R
2017.
Spatiotemporal aggregation for temporally extensive international microdata..
Abstract
|
Full Citation
|
Google
We describe a strategy for regionalizing subnational administrative units in conjunction with harmonizing changes in unit boundaries over time that can be applied to provide small-area geographic identifiers for census microdata. The availability of small-area identifiers blends the flexibility of individual microdata with the spatial specificity of aggregate data. Regionalizing microdata by administrative units poses a number of challenges, such as the need to aggregate individual scale data in a way that ensures confidentiality and issues arising from changing spatial boundaries over time. We describe a regionalization and harmonization strategy that creates units that satisfy spatial and other constraints while maximizing the number of units in a way that supports policy and research use. We describe this regionalization strategy for three test cases of Malawi, Brazil, and the United States. We test different algorithms and develop a semi-automated strategy for regionalization that meets data restrictions, computation, and data demands from end users.
Manson, Steven M; Kugler, Tracy A; Haynes, David
2016.
Deserts in the Deluge: TerraPopulus and Big Human-Environment Data..
Abstract
|
Full Citation
|
Google
Terra Populus, or TerraPop, is a cyberinfrastructure project that integrates, preserves, and disseminates massive data collections describing characteristics of the human population and environment over the last six decades. TerraPop has made a number of GIScience advances in the handling of big spatial data to make information interoperable between formats and across scientific communities. In this paper, we describe challenges of these data, or 'deserts in the deluge' of data, that are common to spatial big data more broadly, and explore computational solutions specific to microdata, raster, and vector data models.
Kugler, Tracy A
2015.
TerraPop: Constructing Boundary Files for Location-Based Integration of Population and Environmental Data.
Abstract
|
Full Citation
|
Google
TerraPop enables research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment. The TerraPop data collection currently includes census microdata from more than 80 countries participating in IPUMS-International, aggregate census data published by an additional 80+ countries, and raster data covering land cover, land use, and climate. The TerraPop data access system (https://data.terrapop.org) enables users to combine data from datasets in different data structures into customized datasets in the structure of their choice. Integration across data structures involves transformations that hinge on boundary files linked to administrative unit codes present in census data. TerraPop has successfully developed boundary files aligned with administrative units present in census data for over 130 countries. Starting with freely available boundary data sources, including the Global Administrative Unit Layers (GAUL), Global Administrative database (GADM), and UN Second Level Administrative Boundaries (SALB), we have developed first- and second-level administrative unit boundaries matching both recent and historical census data. Boundary production has been automated to the extent possible and standardized to promote efficiency and consistency. The process entails identifying source data and information, potentially digitizing boundaries from print or image sources, aligning boundary vertices across multiple sources, and editing shapefiles to match units and codes present in census data. TerraPop also produces boundaries harmonized over time as well as units regionalized to meet minimum population thresholds to protect confidentiality when linking to microdata. The boundaries are used to perform on-the-fly transformations among area-level data, microdata, and rasters as requested through the TerraPop data access system. Users may request raster data summarized to geographic unit boundaries to create area-level data, microdata tabulated by geographic unit to create area-level data, area-level data (including tabulated microdata) distributed to grid cells to create raster data, or area-level data (including summarized raster data) attached to microdata records as contextual variables. The boundary files themselves are also available through the data access system.
Kugler, Tracy A; Van Riper Ma, David; Manson, Steven M; Haynes, David; Donato, Joshua R; Stinebaugh, Katie
2015.
Terra Populus: Workflows for integrating and harmonizing geospatial population and environmental data.
Abstract
|
Full Citation
|
Google
The Terra Populus project (TerraPop) addresses a variety of data management, curation, and preservation challenges with respect to spatiotemporal population and environmental data. In this article, we describe our approaches to these challenges, with a particular focus on geospatial data workflows and associated provenance metadata. The goal of TerraPop is to enable research, learning, and policy analysis by providing integrated spatiotemporal data describing people and their environment. To do so, TerraPop is assembling a globe-spanning and temporally extensive collection of high-quality population and environmental data, ensuring good documentation, and developing a Web-based data access system that enables users to assemble customized integrated data sets drawing on a variety of data sources and formats. We describe TerraPop's collection strategies, detail the geospatial workflows involved in preparing data for ingest into the project database and those used to transform data across formats for dissemination, and discuss the system used to capture and manage provenance metadata throughout the project. A key aspect of the project is the development of global current and historical administrative unit boundaries that can be linked to census data. These boundaries serve as the linchpin of TerraPop's data integration strategy, and constitute an important data set in their own right.
Adamo, Susana B.; Fitch, Catherine A; Kugler, Tracy A
2014.
Climate variability and demographic and socio-economic vulnerability in southern Brazil, 1980-2010: A TerraPop Case Study.
Abstract
|
Full Citation
|
Google
Climate variability affects and impacts human society in different ways, depending on the underlying socioeconomic and demographic vulnerability of specific places, social groups, households and individuals. This differential vulnerability presents spatial and temporal variations, and is rooted in historical patterns of development and relations between human and ecological systems. This paper aims to (a) identify and map critical areas or hotspots of vulnerability to climate variability and its evolution over time (1980-2010), and (b) identify internal variation or differential vulnerability within these areas, using newly available integrated data from the Terra Populus project. These data include geo-referenced climate data, and data describing demography and socioeconomic characteristics of individuals, households and places. This study focus on Southern Brazil Parana, Santa Catarina and Rio Grande do Sul and assess the impact of climate variability on livelihoods and well-being, and their changes over time and across space, for rural and urban populations.
MacManus, Kytt; Kugler, Tracy A
2013.
The influence of statistical inputs on global gridded geospatial datasets.
Abstract
|
Full Citation
|
Google
The fidelity and utility of global gridded datasets is a function of the input statistics and geographic data used in their construction. The quality and resolution of census data and robustness of microdata varies greatly from country to country, but in order to complete regional and global scale analyses of geographic data, it is necessary to integrate these inputs into a common schema and grid resolution. This paper and talk will draw on examples from the production of SEDACs Gridded Population of the World Version 4 to illustrate the need for high resolution census and cartographic data in order to produce accurate top down population allocations. It will also explore challenges of tying microdata to place to make connections with other spatial data - through coordinates and/or fine-grain administrative units with accompanying boundary data. It will use examples from the Terra Populus project to display the utility of the variable richness of microdata (population characteristics beyond counts and basic demographics), particularly in connection with other spatial data.
Total Results: 10