|
Department of Biostatistics Statistical Methods in Epidemiology Working Group 2011 - 2012 |
Lu Tian, Ph.D.
Assistant Professor, Health Research & Policy - Biostatistics, Stanford University
"AUC based Biomarker Ensemble with an Application on Gene Scores Predicting Low Bone Mineral Density"
ABSTRACT: Motivation: The area under the receiver operating characteristic (ROC) curve (AUC), long regarded as a "golden" measure for the predictiveness of a continuous score, has propelled the need to develop AUC-based predictors. However, the AUC-based ensemble methods are rather scant, largely due to the fact that the associated objective function is neither continuous nor concave. Indeed, there is no reliable numerical algorithm identifying optimal combination of a set of biomarkers to maximize the AUC, especially when the number of biomarkers is large.
Results: We have proposed a novel AUC-based statistical ensemble methods for combining multiple biomarkers to differentiate a binary response of interest. Specifically, we propose to replace the non-continuous and non-convex AUC objective function by a convex surrogate loss function, whose minimizer can be efficiently identified. With the established framework, the lasso and other regularization techniques enable feature selections. Extensive simulations have demonstrated the superiority of the new methods to the existing methods. The proposal has been applied to a gene expression data set to construct gene expression scores to differentiate elderly women with low bone mineral density (BMD) and those with normal BMD. The AUCs of the resulting scores in the independent test data set has been satisfactory.
Conclusion: Aiming for directly maximizing AUC, the proposed AUC-based ensemble method provides an efficient means of generating a stable combination of multiple biomarkers, which is especially useful under the high dimensional settings.
Research by XG. Zhao, W. Dai, Y. Li and L. Tian.
Susan Gruber, Ph.D.
Research Fellow, Department of Epidemiology, Harvard School of Public Health
"Collaborative Targeted Maximum LIkelihood Estimation of Causal Effects"
ABSTRACT: Obtaining a maximally efficient unbiased causal effect estimate is not trivial. In randomized controlled trials informative dropout or failed randomization can introduce bias into an unadjusted effect estimate. Non-randomized studies are subject to additional sources of bias. Even in a setting where all confounders have been measured, lack of knowledge of the functional form of the relationships implies that model misspecification is the norm, rather than the exception. In addition, adjusting for all known confounders may at times overly inflate the variance of an estimator, which can manifest as bias. An ideal bias/variance trade-off would be targeted for the parameter of interest. In this talk I will briefly review targeted maximum likelihood estimation (TMLE), an efficient, double-robust substitution estimator introduced by van der Laan and Rubin in 2006, and describe a variant known as collaborative targeted maximum likelihood estimator (C-TMLE) that increases robustness under sparsity and misspecification through data-adaptive nuisance parameter (propensity score) estimation based on the goodness of fit of the resulting estimator of the outcome regression.
Molin Wang, Ph.D.
Assistant Professor of Biostatistics, Department of Biostatistics, Harvard School of Public Health, and Harvard Medical School
"Latency Analysis Under the Cox Model When the Effect May Change Over Time"
ABSTRACT: We consider estimation and inference for latency in the Cox proportional hazard model framework, where time to event is the outcome. In many public health settings, it is of interest to assess whether exposure effects are subject to a latency period, where the risk of developing disease depending on the exposure level varies over time, perhaps affecting risk only during times near the occurrence of the outcome, or perhaps affecting risk only during times preceding a lag of some duration. Identification of the latency period, if any, is an important aspect of assessing risks of environmental and occupational exposures. For example, in air pollution epidemiology, of interest is often not only the effect of the m-year moving cumulative average air pollution level on risk of all cause mortality, but also point and interval estimation of m itself. In this talk, we will focus on methods for point and interval estimation of the latency period under a several models for the timing of exposure which have previously appeared in the epidemiologic literature. Computational methods will be discussed. The method will be illustrated in the study of the timing of the effects of constituents of air pollution on mortality in the Nurses’ Health Study.
Linda Valeri
Doctoral Student, Department of Biostatistics, Harvard School of Public Health
"Mediation Analysis When Non-linearities Are Present and The Mediator Is Measured with Error"
ABSTRACT: Mediation analysis is a popular approach to examine the extent to which the effect of an exposure to an outcome is through an intermediate and the extent to which is direct. First, I will present the developments in mediation analysis for non linear models within the counterfactual framework and compare the sorts of inferences about mediation that are possible in the presence of exposure-mediator interaction when using a counterfactual versus a purely statistical approach.
When the mediator is mis-measured the validity of mediation analysis can be severely undermined. I will discuss the results of the study of the effects of classical, non differential measurement error on the mediator in the estimation of direct and indirect causal effects when the outcome is continuous or binary and exposure-mediator interaction can be present.
A method of correction for measurement error is proposed along the lines of regression calibration with sensitivity analysis for which no validation samples or gold standard for the mis-measured mediator are required.
The effect of measurement error on the assessment of mediation and interaction and the correction strategy are further illustrated via simulation and analysis of data from a recent study that investigates whether the effect of genetic variants on 15q25.1 on lung cancer is direct or operates through pathways related to smoking behavior.
Melinda Power
Doctoral Student, Department of Epidemiology, Harvard School of Public Health
"Hypertension and Cognitive Impairment in the Elderly: Issues of Selection and Life-Course Exposure Data"
ABSTRACT: Epidemiologic research on the association between hypertension and cognition or dementia collectively suggests that the association is dependent on the age at which hypertension status is assessed. Several potential explanations for this pattern are possible and include the influence of selection bias, an independent effect of duration of hypertension, an independent effect of age at onset of hypertension, and reverse causation. I will present two analyses that attempt to determine whether these factors, or a combination of these factors, can explain the age-dependent association that has been observed in the literature. The first analysis reproduces the age-dependent association observed across studies within a single dataset including longitudinal data on hypertension over the course of 30 years, using a penalized cubic spline to model betas across multiple point estimate studies nested within this larger dataset. We then explore the influence of selection bias using inverse probability weights for censoring and the influence of duration through stratification based on duration of hypertension prior to cognitive testing. The second analysis focuses more narrowly on how age at onset and duration of hypertension modify the association between hypertension and cognition. I will present a linear marginal structural model, using inverse probability weights for censoring and confounding, that explores this issue.
Ryan Seals
Doctoral Student, Department of Epidemiology, Harvard School of Public Health
"Age-Period-Cohort Models: ALS in Denmark, 1970-2008"
ABSTRACT: In the presence of known age dependence, the effects of period and birth cohort on mortality or incidence trends are difficult to disentangle due to the linear dependence between age, period and cohort. The scientific interpretation of period and cohort effects are, however, often of importance in testing hypotheses about disease causation. Many methods have been suggested to estimate the effects of period and birth cohort on trends, but none overcome the problems inherent in disentangling the problem of linear dependence, and most suffer from the problems of arbitrary constraints. Here we review the recent suggestions for estimating these effects, and their relative advantages and disadvantages. We apply a method that uses minimal assumptions to estimate mortality and incidence trends in ALS in Denmark over the past three decades. The use of statistical "solutions" to the problem of linear dependence is wholly dependent on subject-matter knowledge and the particular problem under study.
Lingling Li, Ph.D.
Assistant Professor and Biostatistician in the Department of Population Medicine, Harvard Medical School
"Propensity-Score based Sensitivity Analysis Method for Uncontrolled Confounding"
ABSTRACT: In this talk, we will introduce a sensitivity analysis method to address the issue of uncontrolled confounding in observational studies. The method is a direct extension of an existing sensitivity analysis method by Brumback et al. (2004). The difference is that in this new method, we quantify the hidden bias due to uncontrolled confounding using a one-dimensional sensitivity function (SF), which depends on the propensity score only. Propensity score is defined as the conditional probability of being assigned to a selected treatment given the measured confounders. The new method nicely reduces the dimension of the sensitivity function, and makes it much easier to impose reasonable assumptions on the sensitivity functional forms and the values of coefficients. In addition, it offers opportunities for robust inference since one-dimensional continuous functions can be well approximated by low order polynomial structures (e.g., linear, quadratic). We construct SF-corrected inverse probability weighted estimators to draw inference on the causal treatment effect. We demonstrate the use of the new method by implementing it to an asthma study which evaluates the effect of clinician prescription pattern about the use of inhaled corticosteroids (ICs) for children with persistent asthma on selected clinical outcomes.
James O'Malley, Ph.D.
Associate Professor of Statistics, Department of Health Care Policy, Harvard Medical School
"Optimal Small-Area Estimation and Design When Nonrespondents are Subsampled for Followup in a Comparative Study of Healthcare Quality"
ABSTRACT: Many surveys first mail questionnaires to sampled subjects and then follow up mail nonrespondents by phone. The high unit costs of telephone interviews make it cost-effective to subsample the followup. We derive optimal subsampling rates for the phone subsample for comparisons of health plans or other units. Computations under design-based inference depart from the traditional formulae for Neyman allocation because the phone sample size at each plan is constrained by the number of mail non-respondents. Because plan means for mail respondents are highly correlated with those for phone respondents, more precise estimates (at fixed overall cost) for potential phone respondents are obtained by combining the direct estimates from phone followup with predictions from the mail survey using small-area estimation (SAE) models. We investigate hybrid SAE approaches -- weighted combinations of the direct and model-based estimators -- that might provide more interpretable improved comparisons among plans by reducing the amount of shrinkage to the overall mean. The linear combination of the direct and model-based estimators defines a family of hybrid approaches with the purely design-based (DsgnBased) and the full Bayesian SAE (FullBayes) approaches as extremes. The performance of a hybrid procedure based on the minimum variance unbiased estimator of the unit mean was found to be slightly inferior to FullBayes with both uniformly superior to DsgnBased.
Acknowledgement: This is joint work with Alan M. Zaslavsky
| Back to HSPH Biostatistics |
Maintained by the
Biostatistics Webmaster
Last Update: May 1, 2012 |