Total Results: 36
Agrawal, Saurabh; Steinbach, Michael; Boley, Daniel; Chatterjee, Snigdhansu; Atluri, Gowtham; Dang, Anh The; Liess, Stefan; Kumar, Vipin
2020.
Mining novel multivariate relationships in time series data using correlation networks.
Abstract
|
Full Citation
|
Google
In many domains, there is significant interest in capturing novel relationships between time series that represent activities recorded at different nodes of a highly complex system. In this paper, we introduce multipoles, a novel class of linear relationships between more than two time series. A multipole is a set of time series that have strong linear dependence among themselves, with the requirement that each time series makes a significant contribution to the linear dependence. We demonstrate that most interesting multipoles can be identified as cliques of negative correlations in a correlation network. Such cliques are typically rare in a real-world correlation network, which allows us to find almost all multipoles efficiently using a clique-enumeration approach. Using our proposed framework, we demonstrate the utility of multipoles in discovering new physical phenomena in two scientific domains: climate science and neuroscience. In particular, we discovered several multipole relationships that are reproducible in multiple other independent datasets and lead to novel domain insights.
Park, Jun Young; Polzehl, Joerg; Chatterjee, Snigdhansu; Brechmann, André; Fiecas, Mark
2020.
Semiparametric modeling of time-varying activation and connectivity in task-based fMRI data.
Abstract
|
Full Citation
|
Google
In functional magnetic resonance imaging (fMRI), there is a rise in evidence that time-varying functional connectivity, or dynamic functional connectivity (dFC), which measures changes in the synchronization of brain activity, provides additional information on brain networks not captured by time-invariant (i.e., static) functional connectivity. While there have been many developments for statistical models of dFC in resting-state fMRI, there remains a gap in the literature on how to simultaneously model both dFC and time-varying activation when the study participants are undergoing experimental tasks designed to probe at a cognitive process of interest. A method is proposed to estimate dFC between two regions of interest (ROIs) in task-based fMRI where the activation effects are also allowed to vary over time. The proposed method, called TVAAC (time-varying activation and connectivity), uses penalized splines to model both time-varying activation effects and time-varying functional connectivity and uses the bootstrap for statistical inference. Simulation studies show that TVAAC can estimate both static and time-varying activation and functional connectivity, while ignoring time-varying activation effects would lead to poor estimation of dFC. An empirical illustration is provided by applying TVAAC to analyze two subjects from an event-related fMRI learning experiment.
Kodra, Evan; Bhatia, Udit; Chatterjee, Snigdhansu; Chen, Stone; Ganguly, Auroop Ratan
2020.
Physics-guided probabilistic modeling of extreme precipitation under climate change.
Abstract
|
Full Citation
|
Google
Earth System Models (ESMs) are the state of the art for projecting the effects of climate change. However, longstanding uncertainties in their ability to simulate regional and local precipitation extremes and related processes inhibit decision making. Existing state-of-the art approaches for uncertainty quantification use Bayesian methods to weight ESMs based on a balance of historical skills and future consensus. Here we propose an empirical Bayesian model that extends an existing skill and consensus based weighting framework and examine the hypothesis that nontrivial, physics-guided measures of ESM skill can help produce reliable probabilistic characterization of climate extremes. Specifically, the model leverages knowledge of physical relationships between temperature, atmospheric moisture capacity, and extreme precipitation intensity to iteratively weight and combine ESMs and estimate probability distributions of return levels. Out-of-sample validation suggests that the proposed Bayesian method, which incorporates physics-guidance, has the potential to derive reliable precipitation projections, although caveats remain and the gain is not uniform across all cases.
Bera, Sabyasachi; Chatterjee, Snigdhansu
2020.
High dimensional, robust, unsupervised record linkage.
Abstract
|
Full Citation
|
Google
We develop a technique for record linkage on high dimensional data, where the two datasets may not have any common variable, and there may be no training set available. Our methodology is based on sparse, high dimensional principal components. Since large and high dimensional datasets are often prone to outliers and aberrant observations, we propose a technique for estimating robust, high dimensional principal components. We present theoretical results validating the robust, high dimensional principal component estimation steps, and justifying their use for record linkage. Some numeric results and remarks are also presented.
Chatterjee, Snigdhansu
2019.
The scale enhanced wild bootstrap method for evaluating climate models using wavelets.
Abstract
|
Full Citation
|
Google
A novel resampling-based method is presented for testing if a Physics-based model or reanalysis object successfully emulates climate signals. We establish asymptotic consistency, present simulations and a real-data application to illustrate the performance of the proposed method.
Majumdar, Subhabrata; Chatterjee, Snigdhansu
2019.
On Weighted Multivariate Sign Functions.
Abstract
|
Full Citation
|
Google
Multivariate sign functions are often used for robust estimation and inference. We propose using data dependent weights in association with such functions. The proposed weighted sign functions retain desirable robustness properties, while significantly improving efficiency in estimation and inference compared to unweighted multivariate sign-based methods. Using weighted signs, we demonstrate methods of robust location estimation and robust principal component analysis. We extend the scope of using robust multivariate methods to include robust sufficient dimension reduction and functional outlier detection. Several numerical studies and real data applications demonstrate the efficacy of the proposed methodology.
Chatterjee, Snigdhansu
2019.
A resampling approach to estimation of the linking variance in the Fay–Herriot model.
Abstract
|
Full Citation
|
Google
In the Fay–Herriot model, we consider estimators of the linking variance obtained using different types of resampling schemes. The usefulness of this approach is that even when the estimator from the original data falls below zero or any other specified threshold, several of the resamples can potentially yield values above the threshold. We establish asymptotic consistency of the resampling-based estimator of the linking variance for a wide variety of resampling schemes and show the efficacy of using the proposed approach in numeric examples.
Huang, Whitney K.; Cooley, Daniel S.; Ebert-Uphoff, Imme; Chen, Chen; Chatterjee, Snigdhansu
2019.
New Exploratory Tools for Extremal Dependence: χ Networks and Annual Extremal Networks.
Abstract
|
Full Citation
|
Google
Understanding dependence structure among extreme values plays an important role in risk assessment in environmental studies. In this work, we propose the χ network and the annual extremal network for exploring the extremal dependence structure of environmental processes. A χ network is constructed by connecting pairs whose estimated upper tail dependence coefficient, χ^ , exceeds a prescribed threshold. We develop an initial χ network estimator, and we use a spatial block bootstrap to assess both the bias and variance of our estimator. We then develop a method to correct the bias of the initial estimator by incorporating the spatial structure in χ. In addition to the χ network, which assesses spatial extremal dependence over an extended period of time, we further introduce an annual extremal network to explore the year-to-year temporal variation of extremal connections. We illustrate the χ and the annual extremal networks by analyzing the hurricane season maximum precipitation at the US Gulf Coast and surrounding area. Analysis suggests there exists long distance extremal dependence for precipitation extremes in the study region and the strength of the extremal dependence may depend on some regional scale meteorological conditions, for example, sea surface temperature.
Majumdar, Subhabrata; Chatterjee, Snigdhansu
2018.
Non-convex penalized multitask regression using data depth-based penalties.
Abstract
|
Full Citation
|
Google
Braverman, Amy; Chatterjee, Snigdhansu; Heyman, Megan; Cressie, Noel
2017.
Probabilistic evaluation of competing climate models.
Abstract
|
Full Citation
|
Google
Abstract. Climate models produce output over decades or longer at high spatial and temporal resolution. Starting values, boundary conditions, greenhouse gas emissions, and so forth make the climate model an uncertain representation of the climate system. A standard paradigm for assessing the quality of climate model simulations is to compare what these models produce for past and present time periods, to observations of the past and present. Many of these comparisons are based on simple summary statistics called metrics. In this article, we propose an alternative: evaluation of competing climate models through probabilities derived from tests of the hypothesis that climate-model-simulated and observed time sequences share common climate-scale signals. The probabilities are based on the behavior of summary statistics of climate model output and observational data over ensembles of pseudo-realizations. These are obtained by partitioning the original time sequences into signal and noise components, and using a parametric bootstrap to create pseudo-realizations of the noise sequences. The statistics we choose come from working in the space of decorrelated and dimension-reduced wavelet coefficients. Here, we compare monthly sequences of CMIP5 model output of average global near-surface temperature anomalies to similar sequences obtained from the well-known HadCRUT4 data set as an illustration.
Arbet, Jaron; McGue, Matthew; Chatterjee, Snigdhansu; Basu, Saonli
2017.
Resampling-based tests for Lasso in genome-wide association studies.
Abstract
|
Full Citation
|
Google
Genome-wide association studies involve detecting association between millions of genetic variants and a trait, which typically use univariate regression to test association between each single variant and the phenotype. Alternatively, Lasso penalized regression allows one to jointly model the relationship between all genetic variants and the phenotype. However, it is unclear how to best conduct inference on the individual Lasso coefficients, especially in high-dimensional settings. We consider six methods for testing the Lasso coefficients: two permutation (Lasso-Ayers, Lasso-PL) and one analytic approach (Lasso-AL) to select the penalty parameter for type-1-error control, residual bootstrap (Lasso-RB), modified residual bootstrap (Lasso-MRB), and a permutation test (Lasso-PT). Methods are compared via simulations and application to the Minnesota Center for Twins and Family Study. We show that for finite sample sizes with increasing number of null predictors, Lasso-RB, Lasso-MRB, and Lasso-PT fail to be viable methods of inference. However, Lasso-PL and Lasso-AL remain fast and powerful tools for conducting inference with the Lasso, even in high-dimensions. Our results suggest that the proposed permutation selection procedure (Lasso-PL) and the analytic selection method (Lasso-AL) are fast and powerful alternatives to the standard univariate analysis in genome-wide association studies.
Ermagun, Alireza; Chatterjee, Snigdhansu; Levinson, David Matthew
2017.
Using temporal detrending to observe the spatial correlation of traffic.
Abstract
|
Full Citation
|
Google
This empirical study sheds light on the spatial correlation of traffic links under different traffic regimes. We mimic the behavior of real traffic by pinpointing the spatial correlation between 140 freeway traffic links in a major sub-network of the Minneapolis—St. Paul freeway system with a grid-like network topology. This topology enables us to juxtapose the positive and negative correlation between links, which has been overlooked in short-term traffic forecasting models. To accurately and reliably measure the correlation between traffic links, we develop an algorithm that eliminates temporal trends in three dimensions: (1) hourly dimension, (2) weekly dimension, and (3) system dimension for each link. The spatial correlation of traffic links exhibits a stronger negative correlation in rush hours, when congestion affects route choice. Although this correlation occurs mostly in parallel links, it is also observed upstream, where travelers receive information and are able to switch to substitute paths. Irrespective of the time-of-day and day-of-week, a strong positive correlation is witnessed between upstream and downstream links. This correlation is stronger in uncongested regimes, as traffic flow passes through consecutive links more quickly and there is no congestion effect to shift or stall traffic. The extracted spatial correlation structure can augment the accuracy of short-term traffic forecasting models.
Monsen, Karen A.; Chatterjee, Snigdhansu; Timm, Jill E.; Kay Poulsen, J.; McNaughton, Diane B.
2015.
Factors Explaining Variability in Health Literacy Outcomes of Public Health Nursing Clients.
Abstract
|
Full Citation
|
Google
Lu, Y; Chatterjee, Snigdhansu
2014.
Instability and Change Detection in Exponential Families and Generalized Linear Models..
Abstract
|
Full Citation
|
Google
Verghese, J; Annweiler, C; Ayers, E; Barzilai, N; Beauchet, O; Bennett, David; Bridenbaugh, SA; Buchman, AS; Callisaya, ML; Camicioli, R; Capistrant, Benjamin D; Chatterjee, Soumyadeep; De Cock, AM; Ferrucci, L; Giladi, N; Guralnik, JM; Hausdorff, JM; Holtzer, R; Kim, KW; Kowal, P; Kressig, RW; Lim, JY; Lord, S; Meguro, K; Montero-Odasso, M; Muir-Hunter, SW; Noone, ML; Rochester, L; Srikanth, V; Wang, C
2014.
Motoric cognitive risk syndrome: multicountry prevalence and dementia risk.
Abstract
|
Full Citation
|
Google
OBJECTIVES: Our objective is to report prevalence of motoric cognitive risk syndrome (MCR), a newly described predementia syndrome characterized by slow gait and cognitive complaints, in multiple countries, and its association with dementia risk. METHODS: Pooled MCR prevalence analysis of individual data from 26,802 adults without dementia and disability aged 60 years and older from 22 cohorts from 17 countries. We also examined risk of incident cognitive impairment (Mini-Mental State Examination decline >/=4 points) and dementia associated with MCR in 4,812 individuals without dementia with baseline Mini-Mental State Examination scores >/=25 from 4 prospective cohort studies using Cox models adjusted for potential confounders. RESULTS: At baseline, 2,808 of the 26,802 participants met MCR criteria. Pooled MCR prevalence was 9.7% (95% confidence interval [CI] 8.2%-11.2%). MCR prevalence was higher with older age but there were no sex differences. MCR predicted risk of developing incident cognitive impairment in the pooled sample (adjusted hazard ratio [aHR] 2.0, 95% CI 1.7-2.4); aHRs were 1.5 to 2.7 in the individual cohorts. MCR also predicted dementia in the pooled sample (aHR 1.9, 95% CI 1.5-2.3). The results persisted even after excluding participants with possible cognitive impairment, accounting for early dementia, and diagnostic overlap with other predementia syndromes. CONCLUSION: MCR is common in older adults, and is a strong and early risk factor for cognitive decline. This clinical approach can be easily applied to identify high-risk seniors in a wide variety of settings.
Mukherjee, Ujjal Kumar; Chatterjee, Snigdhansu
2014.
Fast algorithm for computing weighted projection quantiles and data depth for high-dimensional large data clouds.
Abstract
|
Full Citation
|
Google
Mukherjee, Ujjal Kumar; Majumdar, Sumit R; Chatterjee, Snigdhansu
2014.
Fast and Robust Supervised Learning in High Dimensions Using the Geometry of the Data.
Abstract
|
Full Citation
|
Google
Li, Zhonghua; Qiu, Peihua; Chatterjee, Snigdhansu; Wang, Zhaojun
2013.
Using p values to design statistical process control charts.
Abstract
|
Full Citation
|
Google
Berndt, Sonja I.; Skibola, Christine; Vijai, Joseph; Camp, Nicola J.; Nieters, Alexandra; Wang, Zhaoming; Cozen, Wendy; Monnereau, Alain; Wang, Sophia; Kelly, Rachel; Lan, Qing; Teras, L R; Chatterjee, N; Chung, Charles C.; Yeager, Meredith; Brooks-Wilson, Angela; Hartge, Patricia; Purdue, Mark P.; Birmann, B M; Slager, Susan L.
2013.
Genome-wide association study identifies multiple risk loci for chronic lymphocytic leukemia.
Abstract
|
Full Citation
|
Google
Genome-wide association studies (GWAS) have previously identified 13 loci associated with risk of chronic lymphocytic leukemia or small lymphocytic lymphoma (CLL). To identify additional CLL susceptibility loci, we conducted the largest meta-analysis for CLL thus far, including four GWAS with a total of 3,100 individuals with CLL (cases) and 7,667 controls. In the meta-analysis, we identified ten independent associated SNPs in nine new loci at 10q23.31 (ACTA2 or FAS (ACTA2/FAS), P = 1.22 × 10−14), 18q21.33 (BCL2, P = 7.76 × 10−11), 11p15.5 (C11orf21, P = 2.15 × 10−10), 4q25 (LEF1, P = 4.24 × 10−10), 2q33.1 (CASP10 or CASP8 (CASP10/CASP8), P = 2.50 × 10−9), 9p21.3 (CDKN2B-AS1, P = 1.27 × 10−8), 18q21.32 (PMAIP1, P = 2.51 × 10−8), 15q15.1 (BMF, P = 2.71 × 10−10) and 2p22.2 (QPCT, P = 1.68 × 10−8), as well as an independent signal at an established locus (2q13, ACOXL, P = 2.08 × 10−18). We also found evidence for two additional promising loci below genome-wide significance at 8q22.3 (ODF1, P = 5.40 × 10−8) and 5p15.33 (TERT, P = 1.92 × 10−7). Although further studies are required, the proximity of several of these loci to genes involved in apoptosis suggests a plausible underlying biological mechanism.
Faghmous, James H; Le, Matthew; Uluyol, Muhammed; Kumar, Vipin; Chatterjee, Snigdhansu
2013.
A parameter-free spatio-temporal pattern mining model to catalog global ocean dynamics.
Abstract
|
Full Citation
|
Google
Total Results: 36