Department of Biostatistics
Environmental Statistics Seminar
2015 - 2016
ABSTRACT: A major feature of air quality regulatory policy in the US is to incentivize the installation of flue-gas-desulfurization scrubbers on power plant smokestacks. One goal of these policies is to reduce emissions that are precursors to the formation of PM2.5 in the atmosphere, which is known to be associated with adverse health outcomes. However, the presumed relationships between scrubbers, emissions, and ambient PM2.5 have never been estimated or empirically verified amid the realities of actual regulatory implementation. The goal of this paper is to develop new statistical methods to quantify these causal relationships. We frame this problem as one of mediation analysis to evaluate the extent to which the causal effect of a scrubber on ambient PM2.5 is mediated through causal effects on power plant emissions. Since power plants emit various pollutants including sulfur dioxide (SO2), nitrous oxides (NOx) and carbon dioxide (CO2), we develop new statistical methods for settings with multiple intermediate mediating factors that are measured contemporaneously, may interact with one another, and may exhibit joint mediating effects. Specifically, we propose new methods leveraging two related frameworks for causal inference in the presence of mediating variables: principal stratification and causal mediation analysis. We define principal effects based on multiple mediators, and also introduce a new decomposition of the total effect of a scrubber on ambient PM2.5 into the natural direct effect and natural indirect effects for all mediating emissions jointly, each pair of emissions, and each emission individually. Both approaches are anchored to the exact same models for the observed data, which we specify with flexible Bayesian nonparametric techniques. We first provide assumptions for identifiability of principal causal effects, then augment these with two additional assumptions required to conduct a genuine mediation analysis relying on natural direct and indirect effects. The principal stratification and causal mediation analyses are interpreted in tandem to provide the first comprehensive empirical investigation of the presumed causal pathways that motivate a variety of air quality control strategies that aim to reduce harmful emissions from power plants.
ABSTRACT: We propose a simplified approach to matching for causal inference that simultaneously optimizes both balance (between the treated and control groups) and matched sample size. This procedure resolves two widespread tensions in the use of this popular methodology. First, current practice is to run a matching method that maximizes one balance metric (such as a propensity score or average Mahalanobis distance), but then to check whether it succeeds with respect to a different balance metric for which it was not designed (such as differences in means or L1). Second, current matching methods either fix the sample size and maximize balance (e.g., Mahalanobis or propensity score matching), fix balance and maximize the sample size (such as coarsened exact matching), or are arbitrary compromises between the two (such as calipers with ad hoc thresholds applied to other methods). These tensions lead researchers to either try to optimize manually, by iteratively tweaking their matching method and rechecking balance, or settle for suboptimal solutions. We address these tensions by first defining and showing how to calculate the matching frontier as the set of matching solutions with maximum balance for each possible sample size. Researchers can then choose one, several, or all matching solutions from the frontier for analysis in one step without iteration. The main difficulty in this strategy is that checking all possible solutions is exponentially difficult. We solve this problem with new algorithms that finish fast, optimally, and without iteration or manual tweaking. We also offer easy-to-use software that implements these ideas, along with analyses of the effect of sex on judging and job training programs that show how the methods we introduce enable us to extract new knowledge from existing data sets.
ABSTRACT: The Fifth Assessment Report of Intergovernmental Panel on Climate Change (IPCC) states there is medium confidence that heat waves, defined as consecutive days with extremely high temperature, have become more frequent and longer-lasting globally in the second half of the 20th century, and that such trend is very likely to continue in the 21st century. In the United States, mortality and morbidity related to heat waves have contributed to a high level of health costs (Knowlton et al. 2011). Response plans for extreme heat, however, are still far from adequate. The aims of our studies were to understand (1) the spatial and temporal variability of the health impact of heat waves in the US, (2) factors that explain the heterogeneity, and (3) health burden in the near future. Two examples will be given to address these research questions. First, we investigated the effect of heat wave on heat stroke hospitalizations in 1,916 US counties in 1999-2010. We found that the relative risk (RR) declined significantly over time, and was highest in the northeast and lowest in the west north central region. The incidence rate on non-heat wave days (baseline rate) increased slightly over time, and regions with higher RR typically had lower baseline rate. We tested several effect modifiers that potentially explained the spatial contrasts. We found a lower RR among counties with higher central AC prevalence and a lower baseline rate among counties with cooler climate or higher vegetation coverage. Second, we examined the effect of heat waves on all-cause mortality in 209 US cities in 1962-2006. We first improved the epidemiologic modeling for the association between heat waves on mortality and then projected mortality attributable to heat waves by 2050 using a rich set of climate models. We found that the southern US is the part that faces increasing heat-wave-related deaths. The results suggested that policies to reduce the health burden of heat waves should be region-specific, and that regional adaptation strategies are needed.
ABSTRACT: In the biomedical research community the role of environmental factors on disease is still not very well understood. Part of the reason is that environmental factors can be so varied in that can range from socioeconomic factors to environmental pollutants. This concept has been formalized with the term "exposome" which represents a human's exposure to all of these environmental factors. This presents two challenges: 1) What are the elements of the exposome that are relevant for disease? 2) How can a researcher have access to this data? In our work, we will discuss tools that we are building to enable researchers to answer these questions. In the era of "Big Data" many organizations are recognizing the value of their data and are in fact making it generally available. In theory it is possible for any researcher to access these data and do research, but data acquisition and processing can be limiting factors. Our goal is to make it easier for researchers to use exposome data by doing the aggregation ourselves and providing a common API for all of this data. We will discuss some of the varied datasources we plan to process (i.e. American Community Survey Data, NOAA, and EPA) and the design decisions we will make in order to make this useful for researchers. In particular, we have found GIS technologies to be very valuable as we aggregate so many disparate data sources. We will discuss the central role tools such as PostGIS have to play in order for this work to be possible.
ABSTRACT: None Given
ABSTRACT: None Given
ABSTRACT: None Given
ABSTRACT: None Given
ABSTRACT: Many methods have been developed to estimate a potentially non-linear exposure-response (ER) curve, while accounting for known observed confounders. However, none of these approaches account for the possibility that estimation of the causal effects at low exposure levels might be affected by a different set of confounding variables than estimation of the causal effects at higher exposure levels. Also, none of the existing approaches account for the fact that there is uncertainty regarding which confounders should be included into the model, especially when the number of confounders is large compared to the sample size. . Furthermore, it is often the case that the sample size at extreme exposure levels is significantly smaller than at average exposure. Extrapolation and estimation of the ER curve at extreme exposure levels using information from normal levels can lead to significant bias in the estimation of causal effects. Such a situation is met in the study of the health effects of low ambient air pollution. While a lot of information exists for areas of average air pollution, we would like to estimate the causal effect of ambient air pollution at low levels, while using the information of all exposure levels to gain power. Our approach borrows information across exposure levels to identify the important confounding variables at each level separately. Using this information, we estimate the whole ER curve, which will have a causal interpretation, while accounting for the uncertainty in confounder selection at each level of exposure.
Joint work by Georgia Papadogeorgou and Francesca Dominici.
ABSTRACT: None Given
|Back to SPH Biostatistics||
Maintained by the
Last Update: January 29, 2016