Department of Biostatistics
Environmental Statistics Seminar
2012 - 2013
ABSTRACT: Eighty individuals from the agricultural health study were selected for a pilot study of methylation based on high and low levels of exposure to organophosphates (OPs). The goal of the study is to detect associations between exposure to OPs and methylation levels in blood. Methylation was measured on a the 450K Infimum Illimina beadchip. The data was corrected for batch effects and normalized. To increase power, variables potentially influencing methylation aside from the exposure of interest were partialled out, and then adjacent methylation sites were clustered based on high correlation. All detected clusters were tested for association with exposure using generalized estimating equations (GEEs). After FDR corrections, about 700 clusters were significantly associated with exposure. To study the possibility that only a few of the 17 OPs in the study affect methylation, we selected the top 10 clusters of methylation sites, based on effect size and significant p-value. We implemented penalized shared effect model that assumes that every OP has the same effect on each of the 10 clusters of methylation sites. The analysis suggests that two OPs, Acephat and Malathion, drive the methylation change in these sites.
ABSTRACT: Estimating causal effects with observational data frequently relies on the notion of the propensity score (PS) to adjust comparisons between exposed and unexposed units for observed confounding factors. One central feature of such methods is specification of which confounding factors to include in the PS to satisfy the ignorability assumption necessary for estimating average causal effects. In practice, researchers are frequently confronted with decisions regarding which of a possibly high-dimensional covariate vector to include in the PS model for adjustment, and often employ simple or ad-hoc methods for selecting variables without acknowledging the uncertainty associated with the selection. This talk proposes three Bayesian methods for variable selection and model averaging for PS methods that 1) select relevant variables from a set of candidate variables to include in the PS and 2) estimate causal treatment effects as weighted averages of estimates under different PS models. The associated weight for each PS model reflects the data-driven support for that model's ability to adjust for the necessary variables.
ABSTRACT: The Centers for Medicare and Medicaid Services currently uses 30-day readmission as a proxy outcome for quality of care for a number of health conditions. The study of 30-day readmission among individuals diagnosed with life threatening illness such as pancreatic cancer is limited, however, by extremely poor survival; 30-day mortality is approximately 30%. The study of predictors of readmission among individuals with pancreatic cancer, therefore, must acknowledge death as a truncation mechanism, a problem known as the 'semi-competing risks' problem. In the analysis of semi-competing risks data, the key analytical challenge is the non-identifiability of the marginal distribution of the non-terminal event (i.e. readmission). In this paper, we propose a Bayesian semiparametric regression model for semi-competing risks data. Specifically, an illness-death model is adopted to represent three transitions for pancreatic cancer patients, recently discharged from initial hospitalization: (1) discharge to readmission, (2) discharge to death, and (3) readmission to death. Dependence between the two event times (readmission and death) is induced via a subject-specific shared frailty. For each of three hazard functions, the log-baseline hazard function is modeled as a mixture of piecewise constant functions, defined on separate time partitions. Model parameters including covariate effects, the baseline transition hazards, frailty terms, and their variance are jointly formulated and estimated via a Metropolis-Hastings-Green algorithm. The proposed framework is applied to data from Medicare Part A on n=965 individuals diagnosed with pancreatic cancer between 2005-2008.
ABSTRACT: Climate change has been labeled the single greatest health threat of this century and yet quantitative assessment of present and future climate change related health outcomes has been limited. This lecture will present several of these sticking points, including the influence of catastrophic events, long time horizons, and multiple causation, that require innovative approaches to quantifying the present and future health risks associated with greenhouse gas emissions.
ABSTRACT: Recent studies have identified associations between ambient air pollution and impaired cognitive function, but cognitive tests such as the Mini Mental State Exam (MMSE) are often difficult to model because of non-standard distributions, ceiling effects and censored data. We investigated the association between cognitive function as assessed by the MMSE and long-term exposure to air pollution as estimated by residential proximity to major roadway, black carbon and fine particulate exposure at home address among participants in the Framingham Offspring Study. This presentation will summarize our work in progress and focus on methodological challenges and considerations in model development.
ABSTRACT: Studying the health effects of mixtures of environmental stressors is an important problem with applications ranging from estimating how simultaneous exposure to multiple air pollutants impacts mortality, to investigating how metal mixtures jointly affect cognitive function. Despite the widespread occurrence of mixtures, most epidemiological studies tend to either focus on single agents at a time or to consider simple interaction models while recognizing that such models likely oversimplify the causal pathway, because we currently lack statistical methodology to more realistically capture the complexity of true exposure. In this talk we introduce Bayesian kernel machine regression (BKMR) with variable selection as a new approach for studying the health effects of mixtures, in which the health outcome is regressed on a nonparametric function h of a high-dimensional vector of exposure variables (e.g., elemental components of the air pollution mixture) that is specified using a kernel function. This approach simultaneously estimates the health effects of exposure to the mixture in a way that accounts for potentially complex nonlinear and interactive effects and identifies which components (e.g., elements) are most harmful. We evaluate performance of BKMR through simulation studies, and we apply the approach to investigate the relationship between elemental pollution components and blood pressure in a panel study of 70 subjects followed over five visits.
ABSTRACT: Exposure measurement error is a limitation of epidemiologic studies of fine particles (PM2.5), which generally assess exposures using ambient PM2.5 concentrations. Collection of data on personal exposures for extended periods is very expensive and intrusive and thus not feasible. Researchers, therefore, have relied on other methods to assess exposures, including using measurements at centrally located monitors or predictions outside the homes of the study subjects as surrogate exposures. We collected data from 8 validation studies, conducted in 9 US cities, on personal PM2.5 exposures and then both linked each subject to the nearest ambient EPA monitor and also predicted PM2.5 concentrations outside their residencies using spatio-temporal modeling. Our findings indicate that the resulting error can be substantial, varying across surrogate exposure metrics and populations with different characteristics. There is more error associated with concentrations measured at central monitors than predictions at the participants' residencies, which account for spatial variability in concentrations. Also, bias tends to be larger for younger subjects, potentially due to more variable activity patterns. Moreover, the bias is not constant across locations, depending on factors associated with housing and transportation characteristics, when central ambient monitor concentrations are used as the surrogate. These findings should be taken under consideration in the interpretation of results from studies using surrogate exposure metrics and could also help researchers understand differences in results across publications.
ABSTRACT: Many stochastic point patterns can be effectively modeled using inhomogeneous Poisson processes, which permit changing spatial intensity and do not favor inhibition or clustering of points. In cases ranging from physical to social sciences, the Voronoi cells corresponding to such points often have practical interpretations, and modeling the distribution of cell areas is important. Here, we use asymptotic analytical arguments to show inhomogeneous Poisson Voronoi cell areas approximately follow a continuous mixture of gamma distributions, where the mixture is determined solely by the process' intensity function. The investigation continues with a simulation study and a comparison to work based on modulated Poisson processes, commonly applied in the design of wireless networks.
ABSTRACT: Human exposures to potentially harmful substances are often estimated through questionnaires and measurements or predictions of environmental levels. These indirect estimates of exposure are vulnerable to recall errors and exposure misclassification. Where available, exposure biomarkers may potentially improve exposure assessment by providing estimates of individual internal dose rather than external exposure to the toxicant under study. However, the use of exposure biomarkers may not always be appropriate. Specifically, when metabolic characteristics impact both biomarker levels (eg: through altered pharmacodynamics) and risk of the disease outcome, use of exposure biomarkers may lead to biased estimates of health effects. In this talk I will: 1) discuss potential problems with using exposure biomarkers in environmental epidemiologic studies, 2) illustrate the potential problems using data from a study of the association between perfluorinated compounds (PFCs) and renal function among highly exposed children, and 3) explore analytic approaches that may provide additional insight into the causal questions.
ABSTRACT: If atmospheric, agricultural, and other environmental systems share one underlying theme it is complex spatial structures, being influenced by such features as topography and weather. Ideally we might model these effects directly; however, information on the underlying causes is often not routinely available. Hence, when modeling environmental systems there exists a need for a class of spatial models which does not rely on the assumption of stationarity.
In this talk, we propose a novel approach to modeling nonstationary spatial fields. The proposed method works by expanding the geographic plane over which these processes evolve into higher dimensional spaces, transforming and clarifying complex patterns in the physical plane. By combining aspects of multi-dimensional scaling, group lasso, and latent variable models, a dimensionally sparse projection is found in which the originally nonstationary field exhibits stationarity. Following a comparison with existing methods in a simulated environment, dimension expansion is studied on a classic test-bed data set historically used to study nonstationary models. Following this, we explore the use of dimension expansion in modeling air pollution in the United Kingdom, a process known to be strongly influenced by rural/urban effects, amongst others, which gives rise to a nonstationary field.
ABSTRACT: Studying the health effects of mixtures of environmental stressors is an important problem with applications ranging from estimating how simultaneous exposure to multiple air pollutants impacts mortality, to investigating how metal mixtures jointly affect cognitive function. Despite the widespread occurrence of mixtures, most epidemiological studies tend to either focus on single agents at a time or to consider simple interaction models while recognizing that such models likely oversimplify the causal pathway, because we currently lack statistical methodology to more realistically capture the complexity of true exposure. In this talk we introduce Bayesian kernel machine regression (BKMR) with variable selection as a new approach for studying the health effects of mixtures, in which the health outcome is regressed on a nonparametric function h of a high-dimensional vector of exposure variables (e.g., elemental components of the air pollution mixture) that is specified using a kernel function. This approach simultaneously estimates the health effects of exposure to the mixture in a way that accounts for potentially complex nonlinear and interactive effects and identifies which components (e.g., elements) are most harmful. We evaluate performance of BKMR through simulation studies, and we present preliminary results from applying the methodology to investigate the relationship between exposure to metal mixtures and cognitive development in the Superfund Bangladesh cohort.
|Back to HSPH Biostatistics||
Maintained by the
Last Update: April 25, 2013