Department of BiostatisticsStatistical Methods in Epidemiology Working Group 2013 - 2014 |

Organizer: Dr. Bernard Rosner

Schedule:

This year, this seminar will be devoted to work on statistical methods used in epidemiologic work. In addition to statistical methods of general epidemiologic use, a number of sessions will be devoted to topics in genetic epidemiology, family studies and clustered data issues. In addition to speakers from Harvard, a limited number of distinguished speakers from outside of Harvard will be invited to participate. Presentations of work by interested faculty and students will be solicited.

Professor in the Department of Biostatistics, Harvard School of Public Health and Professor of Medicine (Biostatistics), Harvard Medical School

ABSTRACT: A marker of validity of a dietary instrument is how well it correlates with a relevant biomarker. For example, there are different methods of assessing dietary Beta carotene (e.g., 24-hour recall, food frequency questionnaire, diet record) and a natural question is which measure correlates most strongly with plasma Beta carotene. Indeed, there already exist methods for comparing dependent correlation coefficients (e.g., Wolfe, 1976; Steiger, 1980; Meng, 1992). However, each of these instruments has associated random error and a related question is after correcting for random error, which instrument correlates most highly with plasma Beta carotene. Since these correlations are assessed from the same subjects, the more general question is how can we compare dependent deattenuated correlation coefficients. This is a generalization of previous work for obtaining confidence limits for a single deattenuated correlation coefficient (Rosner and Willett, 1988). In addition, we extend this work to the comparison of dependent Spearman correlation coefficients, which to our knowledge has never been done before. These methods are illustrated with two examples, (a) the comparison of the validity of different methods for measuring dietary Beta carotene and (b) the comparison of the validity of different storage protocols in processing plasma samples for the determination of HgbAlc.

Research Associate, Department of Epidemiology, Harvard School of Public Health

ABSTRACT: In recent years, a number of large-scale genome-wide association studies have been published for human traits adjusted for other correlated traits with a genetic basis. The motivation for such an adjustment is to discover genetic variants associated with the primary outcome independently of the correlated trait. In this work, we contend that this objective is almost fulfilled for variants that have either similar effects (magnitude and direction) on both the 'primary' and the 'correlated' trait, or effects on the primary outcome only. For all other variants, an unintended bias is introduced with respect to the primary outcome as a result of the adjustment. We identify published genome-wide association studies, including a large meta-analysis of waist-to-hip ratio, where genetic effects are incorrectly interpreted because of such adjustment. Using both theory and simulations, we explore this phenomenon in detail and discuss the ramifications for future genome-wide association studies of correlated traits and diseases.

Instructor in Medicine, Department of Medicine - Pharmacoepidemiology, Brigham and Women's Hospital / Harvard Medical School

ABSTRACT: In nonrandomized studies of comparative effectiveness of medications, utilizing the variation in treatment patterns across prescribers may improve the validity of treatment effect estimates, yet the majority of analyses ignore the prescriber. Via simulations and example studies, we evaluated approaches to utilizing the prescriber in analysis, as well as diagnostic analyses for choosing among these approaches. Including a prescriber random intercept in the propensity score had unpredictable results and sometimes increased bias over the default analysis that ignores the prescriber. Instrumental variable approaches were unbiased when instrument assumptions were met, which required no clustering of patient risk factors within prescriber, a scenario that may be unlikely in practice. Stratification on prescriber often reduced bias from unmeasured patient and prescriber characteristics. Better diagnostics are needed to guide the analytic strategy for comparative effectiveness data that contain prescriber information.

Research Associate in Medicine, Department of Preventive Medicine, Brigham and Women's Hospital / Harvard Medical School

ABSTRACT: In this talk we study aspects of discrimination in risk prediction models with a binary outcome and continuous predictors. The Area Under the Receiver Operating Characteristics Curve, (AUC) is a widely used measure of discrimination in risk prediction models. Numerous studies report the failure of an added significant predictor to produce a significant improvement in the AUC when tested by the DeLong test. We use the theory of U-statistics to explain this contradiction. First we note that DeLong test was developed for non-nested models. Then we show that for two nested models the difference of their AUCs belongs to a degenerate class of U-statistics and therefore has a different asymptotic distribution than the one used by the DeLong test, resulting in substantial (up to 60%) loss of power. Possible solutions are discussed.

It is often assumed that a good new predictor should be uncorrelated with variables already in the model. Assuming multivariate normality of predictors and using some of the theoretical results above, we show that correlation among predictors can be beneficial for discrimination. First, we show that if the effect size is positive, then negative correlation between a new variable and the old risk score is always beneficial for discrimination, whereas zero correlation is often detrimental. Second, we show that a new predictor that regresses well on old predictors (has high multiple R-squared) improves the quality of discrimination. This result holds rigorously for linear discriminant analysis and asymptotically for logistic regression.

Research Scientist, Departments of Epidemiology and Biostatistics, Harvard School of Public Health

ABSTRACT: Regression calibration is a popular method for correcting for bias in effect estimates when a disease risk factor is measured with error. However, the development of such methods has thus far been focused on unbiased estimation and inference for the primary exposure or exposures of interest, and not on finer aspects of model building and variable selection. Adjusting for measurement error using the regression calibration method evokes several questions concerning valid model construction in the presence of covariates. For instance, are the standard regression calibration adjustments valid when a covariate that is associated only with the measurement error process and not the outcome itself is included in the primary regression model? Does the inclusion of such a variable in the primary regression model induce extraneous variation in the resulting estimator? Clear answers to these questions would provide valuable insight and improve estimation of exposure disease associations measured with error. In the paper, we address these questions analytically and develop extended regression calibration estimators as needed based on assumptions about the underlying association between disease and covariates, for both linear and logistic regression models. The methods are applied to data from the Health Professionals Follow-up Study.

Joint work of Xiaomei Liao, Kate Fitzgerald, and Donna Spiegelman.

Back
to HSPH Biostatistics |
Maintained by the
Biostatistics Webmaster
Last Update: April 29, 2014 |