Project 1

Jump to project examples.

Project 1 proposes the development of statistical methods that deal with real-world complexities that commonly arise when mapping aggregated disease count data collected for administrative areas. The specific aims are motivated by problems encountered in epidemiological studies designed for studying and monitoring health disparities, though they are also relevant for area-based studies of environmental effects. The methods address issues associated with administrative boundaries changing over time, sparse disease counts, spatial confounding, and the heavy computational burdens associated with the analysis of large data sets. Specific aims of the project are to develop, evaluate, and implement:

  1. methods for handling boundary misalignment over time in disease mapping settings;
  2. spatial regression models for area-specific disease count data exhibiting complex distributions;
  3. a theoretical framework and practical diagnostic strategies for assessing and minimizing bias from spatial confounding;
  4. fast, memory-efficient algorithms for fitting standard spatio-temporal regression models; and
  5. efficient user-friendly algorithms and statistical software that implement these methods with the goal of disseminating them to health science researchers.

The methods proposed will be applied to area-specific disease count data on U.S. breast cancer incidence, Boston-area premature mortality, Australian ischemic heart disease rates, and incidence and mortality data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) database. The methods will allow researchers to better estimate how rates of cancer and other outcomes vary geographically and over time, thereby aiding in the documentation, analysis, and ultimate reduction of health disparities in the United States, as defined as one of the overarching goals of Healthy People 2010 (US Department of Health and Human Services 2000). This project integrates very closely with the spatial surveillance in Project 2: whereas Project 1 focuses on spatio-temporal modeling for the purpose of characterizing the impact of area-based measures of socioeconomic status or other demographic characteristics on cancer and other diseases, Project 2 focuses on identifying areas where disease rates are unusually high. Analysis of SEER data features prominently in both Projects 1 and 2. Projects 1 and 3 share the common theme of analyzing high-dimensional observational data on cancer. This project relies heavily on the Statistical Computing Core and will benefit from the organziational infrastructure, team building strategies, short-courses and visitor program provided through the Administrative Core.

Project examples

  1. spglmm: Spatial Generalized Linear Mixed Models (R package) Lauren Hund This package contains functions for fitting area-level spatial and spatio-temporal models in R, using the spatial correlation structure described in papers submitted for publication as part of a dissertation. The package's regression function operates on a new class of areal data object, which can be created using functions in the package. Plotting functions are also provided for mapping the area level relative risks and continuous underlying spatial residuals.
    Package release in summer 2011.
  2. Developing statistical models for spatio-temporal data Yeonseung Chung and Francesca Dominici This project combines information from heterogenous data sets to estimating spatially varying long term effects of fine particulate matter on mortality and cancer specific mortality and identify the chemical components of the PM2.5 mass that can increase cancer risks.
  3. Comprehensive smoking bans and acute myocardial infarction among Medicare enrollees in 387 U.S. counties: 1999 to 2008 Christopher D Barr, David M Diez, Yun Wang, Jon Samet, and Francesca Dominici We are conducting a multi-county, multi-state analysis for estimating the association between implementation of comprehensive smoking bans and cardiovascular diseases. The study includes nine states and 387 counties.
  4. The Importance of Scale for Spatial-Confounding Bias and Precision of Spatial Regression Estimators Christopher J Paciorek Residuals in regression models are often spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results... see the full abstract.
  5. glmmGS: Computationally-efficient fitting of GLMMs via PQL (R package) Michele Morara, Christopher Sroka, Subharup Guha, Christopher Paciorek, and Louise Ryan A package for fitting GLMMs (with an emphasis on models with spatial structure) using Penalized Quasi-Likelihood (PQL). Spatial dependence can be specified through the covariance or through the precision, and can be represented in sparse matrix format. The package uses computationally-efficient Gauss-Seidel optimization to update blocks of parameters, exploiting the blocked matrix structure present in some GLMMs. In addition, the package can exploit sparse structure in both the random effects design matrix and the spatial precision matrix of the random effects, if present. The package allows for spatially-correlated random intercepts and spatially-correlated random slopes.

Back to top


Copyright by Xihong Lin, 2011