Jump to project examples.
Project 1 proposes the development of statistical methods that deal with real-world complexities that commonly arise when mapping aggregated disease count data collected for administrative areas. The specific aims are motivated by problems encountered in epidemiological studies designed for studying and monitoring health disparities, though they are also relevant for area-based studies of environmental effects. The methods address issues associated with administrative boundaries changing over time, sparse disease counts, spatial confounding, and the heavy computational burdens associated with the analysis of large data sets. Specific aims of the project are to develop, evaluate, and implement
- methods for handling boundary misalignment over time in disease mapping settings, (first project below to be filled in and align with this aim)
- spatial regression models for area-specific disease count data exhibiting complex distribution patterns,
- a theoretical framework and practical diagnostic strategies for assessing and minimizing bias from spatial confounding,
- fast, memory-efficient algorithms for fitting standard spatio-temporal regression models, and
- efficient user-friendly algorithms and statistical software that implement these methods with the goal of disseminating them to health science researchers.
The methods proposed will be applied to area-specific disease count data on U.S. breast cancer incidence, Boston-area premature mortality, Australian ischemic heart disease rates, and incidence and mortality data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database. The methods will allow researchers to better estimate how rates of cancer and other outcomes vary geographically and over time, thereby aiding in the documentation, analysis, and ultimate reduction of health disparities in the United States, as defined as one of the overarching goals of Healthy People 2010 (US Department of Health and Human Services 2000). This project integrates very closely with the spatial surveillance in Project 2: whereas Project 1 focuses on spatio-temporal modeling for the purpose of characterizing the impact of area-based measures of socioeconomic status or other demographic characteristics on cancer and other diseases, Project 2 focuses on identifying areas where disease rates are unusually high. Analysis of SEER data features prominently in both Projects 1 and 2. Projects 1 and 3 share the common theme of analyzing high-dimensional observational data on cancer. This project relies heavily on the Statistical Computing Core and will benefit from the organziational infrastructure, team building strategies, short-courses and visitor program provided throught the Administrative Core.
- Project title related to aim 1 above (fill in) Authors Project description.
- Developing statistical models for spatio-temporal data Yeonseung Chung and Francesca Dominici This project combines information from heterogenous data sets to estimating spatially varying long term effects of fine particulate matter on mortality and cancer specific mortality and identify the chemical components of the PM2.5 mass that can increase cancer risks.
- Comprehensive smoking bans and acute myocardial infarction among Medicare patients Christopher D Barr, David M Diez, Yun Wang, Francesca Dominici, and Jon Samet We are conducting a multi-county, multi-state analysis for estimating the association between implementation of comprehensive smoking bans and cardiovascular diseases. The study includes nine states and 387 counties.
- The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators Christopher J Paciorek Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results... see the full abstract.
- glmmGS: Computationally-efficient fitting of GLMMs via PQL (Is this correctly in Project 1?) Michele Morara, Christopher Sroka, Subharup Guha, Christopher Paciorek, and Louise Ryan A package for fitting GLMMs (with an emphasis on models with spatial structure) using Penalized Quasi-Likelihood (PQL). Spatial dependence can be specified through the covariance or through the precision, and can be represented in sparse matrix format. The package uses computationally-efficient Gauss-Seidel optimization to update blocks of parameters, exploiting the blocked matrix structure present in some GLMMs. In addition, the package can exploit sparse structure in both the random effects design matrix and the spatial precision matrix of the random effects, if present. The package allows for spatially-correlated random intercepts and spatially-correlated random slopes.