Department of Biostatistics
Environmental Statistics Seminar

2007 - 2008

Coordinator: Eric Tchetgen

Schedule: Fridays, 12:30-1:30 p.m.; alternating with Public Health Surveillance WG
HSPH2, Room 426 (unless otherwise notified)

Contract All | Expand All
Seminar Description
This seminar focuses on statistical issues related to assessing environmental effects on human health and analyzing environmental data in general. Specific areas of interest include air pollution epidemiology, exposure assessment, teratology, fertility and reproduction, respiratory studies, and community-based research as well as general topics such as errors-in-variables models, missing data methods, hierarchical modeling, smoothing, and methods for correlated data such as longitudinal and spatial data analysis. The seminars are generally pitched at a level that encourages student participation. Students interested in receiving credit for attending the seminars may sign up with individual faculty members for some guided readings on a special topic. Please see Chris Paciorek for details.


September 28

James Robins, M.D.
Mitchell L. and Robin LaFoley Dong Professor of Epidemiology, Departments of Epidemiology and Biostatistics, Harvard School of Public Health

"Current Thoughts on Semiparametric Regression in Air Pollution Research"
ABSTRACT: What do we know about the statisitical properties of semiparametric regression? can current methods be improved? These and other issues to be discussed.
October 5

S. V. Subramanian, Ph.D.
Associate Professor, Department of Society, Human Development, and Health, Harvard School of Public Health

"Causal Ecologic Effects on Health: A Methodologic Assessment"
ABSTRACT: The talk will consider the prospects and pitfalls in identifying ecologic or contextual effects with observational data, and offers a methodological assessment of the research concerned with estimating ecological effects on health outcomes. It will begin by defining the causal effect for ecologic variables, distinguishing between common ecological effects and specific ecological effects. This will be followed by a discussion of the role of multilevel modeling in identifying such effects, and clarify the assumptions for these analyses. Following from this, the key threats to causal inference for ecologic effects will be presented, along with a discussion of approaches aimed at strengthening the identification of neighborhood effects on health.
October 19

Env Stat Large Group Meeting
12:30 - 2:00 pm

Eric Tchetgen, Ph.D. and Brent Coull, Ph.D.

Research Fellow and Associate Professor, Department of Biostatistics, Harvard School of Public Health

"Modeling Longitudinal Zero-Altered Count Data"
ABSTRACT: Zero-inflation is a common issue that arises when modeling count data with the use of a poisson or negative binomial process in longitudinal environmental statistics studies, particularly if the outcome is rare in the population of interest. A common approach to address this problem has been to use so-called zero-inflated poisson or negative-binomial mixed models which generally consist of specifying a two-stage mixed model (one for the probability of a zero and another for the positive counts). In this talk, I will discuss some undesirable properties of previously proposed zero-inflated mixed models and will describe an alternative new approach known as a marginally specified zero-altered conditional model. If time permits, we will discuss generalizations of this model to account for time-varying confounders which are also intermediate variables on the causal path between past time-varying environmental exposure and future outcome.
November 16

Nan Laird, Ph.D.
Professor, Department of Biostatistics, Harvard School of Public Health
and
Thomas Hoffman, AM
Doctoral Student, Department of Biostatistics, Graduate School of Arts and Sciences

"Gene-Environment Interaction Tests for Dichotomous Traits in Sibships"
ABSTRACT: As we progress further into understanding the role of genetics in disease, we see the importance of looking at the relationship of genetics and environmental factors. Failing to account for a gene-environment interaction can mask the true effects of a genetic marker. However, failure of the interaction model can also lead to spurious results. Instead a test is needed that is robust to misspecification, but still sensitive to an interaction effect. We extend the FBAT-I test [Lake and Laird(2004)] to sibpairs and look at its applicability to other ascertainment methods. We compare this method to a conditional logistic regression approach of [Witte et al.(1999)] and relate it to relative risk based methods of [Cordell et al.(2004)] and [Weinberg(2000)]. Lastly, we compare these interaction tests to joint tests of gene and gene-environment interaction, and main effects tests of the gene.
November 30

Env Stat Large Group Meeting
12:30 - 2:00 pm

Kim Pearson, Ph.D. and Louise Ryan, Ph.D.

Research Fellow and Professor, Department of Biostatistics, Harvard School of Public Health

"Statistical Challenges in Modeling Outcomes of in vitro Fertilization"
ABSTRACT: The main objective of the Environmental Challenges to Pregnancy study (Dr. Russ Hauser, P.I.) is to explore the developmental toxicity of PCBs and chlorinated pesticides in women undergoing in vitro fertilization (IVF), which can be used as a model for the assessment of the early stages of embryonic development. Analysis of IVF data is challenging because the process consists of a sequence of steps, each with its own outcome of possible interest, and because the majority of women undergoing IVF have the procedure more than once. In particular, the women in the risk set for a particular outcome at the second cycle may have experienced a wide variety of outcomes in the first cycle.
December 4 (Special Date)

Subharup Guha, Ph.D.
Assistant Professor, Department of Statistics, University of Missouri-Columbia

"Gauss-Seidel Estimation of Generalized Linear Mixed Models with Application to Poisson Modeling of Spatially Varying Disease Rates"
ABSTRACT: Generalized linear mixed models (GLMMs) provide an elegant framework for the analysis of correlated data. Due to the non-closed form of the likelihood, GLMMs are often fit by computational procedures like penalized quasi-likelihood (PQL). Special cases of these models are generalized linear models (GLMs), which are often fit using algorithms like iterative weighted least squares (IWLS). High computational costs and memory space constraints often make it difficult to apply these iterative procedures to data sets with very large number of cases.

We propose a computationally efficient strategy based on the Gauss-Seidel algorithm that iteratively fits sub-models of the GLMM to subsetted versions of the data. Additional gains in efficiency are achieved for Poisson models, commonly used in disease mapping problems, because of their special collapsibility property which allows data reduction through summaries. The strategy is applied to investigate the relationship between ischemic heart disease, socioeconomic status and age/gender category in New South Wales, Australia, based on outcome data consisting of approximately 33 million records. This work is joint with Professor Louise Ryan and Dr. Michele Morara.

December 7

Tom Webster, Ph.D.
Associate Professor, Department of Environmental Health, Boston University School of Public Health

"Individual-Level Studies with Ecologic Exposure Measures"
ABSTRACT: Epidemiologists who study environmental and occupational exposures often use ecologic measures of exposure. Are such partially ecologic studies subject to ecologic bias and, if so, to what degree? Studies employing ecologic exposure variables can often be viewed as individual with exposure measurement error, but this does not prevent at least some types of bias seen in purely ecologic studies. One explanation for this apparent paradox is that the exposure measurement error typically reducing exposure variance; in crude studies, this causes "bias magnification" similar to that occurring in fully ecologic studies. More generally, problems arise from loss of information regarding the joint distributions of outcome, exposure and covariates. Nevertheless, partially ecologic studies will often perform better than purely ecologic studies. Recognition of these properties can help in both the design of studies and sensitivity analysis of results.
December 14

Env Stat Large Group Meeting
12:30 - 2:00 pm

Elaine Hoffman, Ph.D. and Louise Ryan, Ph.D.

Research Fellow and Professor, Department of Biostatistics, Harvard School of Public Health

"Regression Models for Data with Detection Limits"
ABSTRACT: Data with detection limits are becoming increasingly common in epidemiologic and environmental studies. We present several methods for handling data with detection limits and propose a strategy that works well regardless of the underlying data structure. A simulation study demonstrates the strengths and weaknesses of each of the proposed methods.
February 22

Lingsong Zhang, Ph.D.
Research Fellow, Department of Biostatistics, Harvard School of Public Health

"Sparse Distance Weighted Discrimination"
ABSTRACT: In the High Dimension Low Sample Size situation, Marron et al, (2002) proposed a new classification method, Distance Weighted Discrimination (DWD), which is similar to Support Vector Machine (SVM) when the number of objects is larger than the number of features, but perform better than SVM on high dimension low sample situation. However, in the high dimensional case, the noise in the data may still dominate in finding the separating hyperplane. In this paper, we proposed a Sparse DWD (SDWD) method, which incorporates the variable selection along with the classification. Theoretical properties are explored under some special conditions. Applications to proteomics and genetic pathways are used to illustrate the SDWD method.
February 29

Brent Coull, Ph.D.
Associate Professor, Department of Biostatistics, Harvard School of Public Health

"Functional Intercept Models for Flexible Assessment of Susceptibility"
ABSTRACT: In many biomedical investigations, a primary goal is the identification of subjects that are susceptible to a given exposure or treatment of interest. A common question that often arises is whether this susceptibility relates to the overall health of an individual. We focus on methods for addressing this question in longitudinal studies when overall health is reflected by a subject's baseline or mean outcome level. In this context, the scientific goal can be posed as one that relates to the association between subject-specific intercepts and slopes in a subject-specific regression model for the outcome. Statistical methods that flexibly assess this relationship are underdeveloped. For instance, standard mixed models containing random intercepts and slopes address this objective by assuming that baseline status and susceptibility is randomly distributed in the population, and that a subject's susceptibility is linearly related to that subject's baseline status. We propose functional random intercept models that relax this assumption and provide flexible yet interpretable assessments of this relationship. This approach relaxes the assumption of linearity between random intercepts and random slopes, and estimates the functional form of this relationship from the data. We propose a penalized spline formulation for this nonparametric function, and implement a fully Bayesian approach for model fitting. We investigate the frequentist performance of our methods via simulation, and apply the model to data from two studies of the relationship between pre-existing health status and susceptibility to ambient particulate matter exposure.
March 14

E. Andres Houseman, Sc.D.
Assistant Professor, Department of Work Environment, University of Massachusetts, Lowell
Adjunct Assistant Professor, Department of Biostatistics, Harvard School of Public Health

"Clustering Methylation Array Data: A Model-Based Recursive-Partitioning Algorithm"
ABSTRACT: Epigenetics is the study of heritable changes in gene function that cannot be explained by changes in DNA sequence. One of the most commonly studied epigenetic alterations is cytosine methylation, which is a well recognized mechanism of epigenetic gene silencing and often occurs at tumor suppressor gene loci in human cancer. We have completed sample collection of many different tumor types, in addition to normal tissues for background comparison, and used the GoldenGate platform from Illumina assesses methylation at 1505 loci associated with over 800 cancer-related genes. While cluster analysis is often used to identify methylation subgroups in data, it is unclear how to cluster methylation data from arrays in a scalable and reliable manner. In this talk, we present a novel model-based recursive-partitioning algorithm to navigate the methylation clusters, and present simulations that show the method has good properties relative to other clustering methods. We demonstrate the method on a methylation data set consisting of 11 types of normal tissue, as well as another data set consisting of mesothelioma tumors. In the latter case, we are able to identify a methylation subgroup that may show increased sensitivity to asbestos exposure.
March 21

Env Stat Large Group Meeting
12:30 - 2:00 pm

Chris Paciorek, Ph.D.

Assistant Professor, Department of Biostatistics, Harvard School of Public Health

"Spatial Assessment of Satellite Proxy Data for Particulate Matter"
ABSTRACT: Challenges in integrating satellite and monitoring data to retrospectively estimate monthly PM2.5 concentrations in the eastern United States

Advances in spatial modeling and GIS technology, combined with the availability and demonstrated utility of satellite proxy data for particulate matter (PM) estimation, present the opportunity for integrated estimation of PM2.5 for use in health analyses of the chronic effects of PM. Bayesian statistical techniques provide a natural framework for the integration. I present a hierarchical Bayesian model that attempts to capture the key features of the available data through multiple likelihood terms, one for each AOD proxy and one for ground-level monitoring data, while accounting for the complicated spatial and temporal misalignment of the data sources. I focus on two key questions. First, how can we model the possibility of spatially-varying bias in AOD as a proxy for PM. Evidence suggests that the bias does vary spatially, which causes identifiability problems inherent in the structure of the data. I show that predictions of PM are very sensitive to the flexibility of the model term that represents spatially-varying bias. The second key question I address is whether including the AOD proxy materially improve predictions of ground-level PM beyond what can be achieved based on the PM data and various covariates.

April 11

Adam A. Szpiro, Ph.D.
Senior Research Fellow, Department of Biostatistics, University of Washington

"Modeling Intra-urban Variation in Air Pollution Exposure to Assess Effects on Cardiovascular Health"
ABSTRACT: In order to estimate the long-term effect of air pollution on cardiovascular health in a cohort study, it is necessary to predict intra-urban variation in individual exposure levels based on relatively sparse measurements. The U.S. EPA Air Quality System (AQS) has only a few sites in any given city as the network is primarily designed to assess regional air pollution levels. To address small-scale variation in air pollution near roads, the EPA-funded MESA Air project is carrying out additional monitoring according to a complex sampling design.

We consider the problem of estimating exposure to gaseous nitrogen oxides (NOx) using a spatio-temporal Bayesian hierarchical regression model. Two features of our dataset present unique challenges. First, to accurately represent small-scale variation in traffic-related pollution near roadway sources, we must pay close attention to the choice of covariates and how these relate to local meteorology and residual correlation. Incorporating physics-based plume modeling significantly improves the statistical properties of predictions. Second, in order to take advantage of irregularly sampled data and to maximize the prediction accuracy, we need a sufficiently rich statistical model with space-time interactions in the correlation structure. Estimation of the model parameters depends on specialized computational techniques.

In this talk, we describe our approach to addressing the modeling and computational challenges outlined above, and we present some illustrative results.

April 18 (FXB G13)

Elizabeth J. Malloy, Ph.D.
Assistant Professor, Department of Mathematics and Statistics, American University and
Usha Govindarajulu, Ph.D.
Associate Biostatistician, Center for Clinical Investigation, Brigham and Women's Hospital, and Instructor in Medicine, Harvard Medical School

"How to Select the Best Smoothed Curve for Estimating Exposure-Response Relationships: Two Simulation Studies"
ABSTRACT: Splines and other smoothing methods are becoming more commonly applied in the analysis of epidemiologic data. They are attractive for estimating exposure-response relationships because they avoid parametric constraints as well as arbitrary exposure cut-points. Despite their widespread use, substantial questions remain about selecting the best smooth model; both how to select the optimal smoothing parameter for a given type of method, as well as how to select from among the alternative smoothing methods. We conducted a pair of simulation studies to examine both questions in the context of the Cox model.

In the first simulation study we focused on penalized splines, and considered several different criteria for selecting the smoothing parameter. The criteria considered here, AIC, AICc, generalized cross-validation, and the Bayesian information criteria, are all based on penalizing the partial likelihood for high degrees of freedom and are computationally efficient to implement. In the second simulation study we examined different types of smoothing methods: restricted cubic splines, natural splines, penalized splines, and fractional polynomials. In both studies, the performance of methods were compared by looking at bias and the power and the size of hypothesis tests of any effect and nonlinearity. In both studies we examined the questions across a series of biologically plausible exposure-response relationships.

May 9

Michael Wu
Doctoral Student, Department of Biostatistics, Harvard School of Public Health / Graduate School of Arts and Sciences

"Sparse Linear Discriminant Analysis for Testing Differential Gene Pathway Activity Induced by Metal Particulate Exposure"
ABSTRACT: Usual approaches to microarray analysis may not be suitable for gene expression profiling studies that examine the effects of environmental exposures. In this talk we develop the sparse linear discriminant pathway test (sLDPT) method, a novel approach to pathway based analysis of microarray experiments, which is particularly suited for studies involving environmental exposures. sLDPT decomposes each gene pathway to a single score by using sparse linear discriminant analysis (sLDA), a regularized form of dimension reduction that simultaneously performs variable selection. A consequence of variable selection is the ability to identify "important" genes that drive observed differences. We find a path algorithm to solve the entire piecewise linear regularization path for sLDA, thereby providing an efficient way to implement the sLDPT method which we apply to examine the effects of PM2.5 exposure on immune responses at the molecular level. Results show that although the subjects exhibited no discernable change in appearance, evidence for systemic inflammation is present.
May 16

John S. Witte, Ph.D.
Professor, Departments of Epidemiology & Biostatistics and Urology; Associate Director, Center for Human Genetics; and Co-Leader, Program in Cancer Genetics, Comprehensive Cancer Center, University of California, San Francisco

"Hierarchical Modeling of Genetic and Environmental Exposures in Genome-Wide Association Studies"
ABSTRACT: Large-scale genetic epidemiologic studies initially investigate hundreds of thousands of single-nucleotide polymorphisms (SNPs) and many environmental exposures. The most 'promising' of these are then further evaluated. Deciding which SNPs, exposures, and interactions merit follow-up is one of the most crucial aspects of such association studies. I present here a hierarchical modeling approach for selecting the most-promising SNPs and exposures that incorporates both conventional results and other existing information. The potential value of this approach is shown by application and simulation.


Back to HSPH Biostatistics Maintained by the Biostatistics Webmaster
Last Update: May 12, 2008