Department of Biostatistics
Statistical Methods in Epidemiology Working Group
2013 - 2014
ABSTRACT: A marker of validity of a dietary instrument is how well it correlates with a relevant biomarker. For example, there are different methods of assessing dietary Beta carotene (e.g., 24-hour recall, food frequency questionnaire, diet record) and a natural question is which measure correlates most strongly with plasma Beta carotene. Indeed, there already exist methods for comparing dependent correlation coefficients (e.g., Wolfe, 1976; Steiger, 1980; Meng, 1992). However, each of these instruments has associated random error and a related question is after correcting for random error, which instrument correlates most highly with plasma Beta carotene. Since these correlations are assessed from the same subjects, the more general question is how can we compare dependent deattenuated correlation coefficients. This is a generalization of previous work for obtaining confidence limits for a single deattenuated correlation coefficient (Rosner and Willett, 1988). In addition, we extend this work to the comparison of dependent Spearman correlation coefficients, which to our knowledge has never been done before. These methods are illustrated with two examples, (a) the comparison of the validity of different methods for measuring dietary Beta carotene and (b) the comparison of the validity of different storage protocols in processing plasma samples for the determination of HgbAlc.
ABSTRACT: In recent years, a number of large-scale genome-wide association studies have been published for human traits adjusted for other correlated traits with a genetic basis. The motivation for such an adjustment is to discover genetic variants associated with the primary outcome independently of the correlated trait. In this work, we contend that this objective is almost fulfilled for variants that have either similar effects (magnitude and direction) on both the 'primary' and the 'correlated' trait, or effects on the primary outcome only. For all other variants, an unintended bias is introduced with respect to the primary outcome as a result of the adjustment. We identify published genome-wide association studies, including a large meta-analysis of waist-to-hip ratio, where genetic effects are incorrectly interpreted because of such adjustment. Using both theory and simulations, we explore this phenomenon in detail and discuss the ramifications for future genome-wide association studies of correlated traits and diseases.
ABSTRACT: In nonrandomized studies of comparative effectiveness of medications, utilizing the variation in treatment patterns across prescribers may improve the validity of treatment effect estimates, yet the majority of analyses ignore the prescriber. Via simulations and example studies, we evaluated approaches to utilizing the prescriber in analysis, as well as diagnostic analyses for choosing among these approaches. Including a prescriber random intercept in the propensity score had unpredictable results and sometimes increased bias over the default analysis that ignores the prescriber. Instrumental variable approaches were unbiased when instrument assumptions were met, which required no clustering of patient risk factors within prescriber, a scenario that may be unlikely in practice. Stratification on prescriber often reduced bias from unmeasured patient and prescriber characteristics. Better diagnostics are needed to guide the analytic strategy for comparative effectiveness data that contain prescriber information.
ABSTRACT: In this talk we study aspects of discrimination in risk prediction models with a binary outcome and continuous predictors. The Area Under the Receiver Operating Characteristics Curve, (AUC) is a widely used measure of discrimination in risk prediction models. Numerous studies report the failure of an added significant predictor to produce a significant improvement in the AUC when tested by the DeLong test. We use the theory of U-statistics to explain this contradiction. First we note that DeLong test was developed for non-nested models. Then we show that for two nested models the difference of their AUCs belongs to a degenerate class of U-statistics and therefore has a different asymptotic distribution than the one used by the DeLong test, resulting in substantial (up to 60%) loss of power. Possible solutions are discussed.
It is often assumed that a good new predictor should be uncorrelated with variables already in the model. Assuming multivariate normality of predictors and using some of the theoretical results above, we show that correlation among predictors can be beneficial for discrimination. First, we show that if the effect size is positive, then negative correlation between a new variable and the old risk score is always beneficial for discrimination, whereas zero correlation is often detrimental. Second, we show that a new predictor that regresses well on old predictors (has high multiple R-squared) improves the quality of discrimination. This result holds rigorously for linear discriminant analysis and asymptotically for logistic regression.
ABSTRACT: Regression calibration is a popular method for correcting for bias in effect estimates when a disease risk factor is measured with error. However, the development of such methods has thus far been focused on unbiased estimation and inference for the primary exposure or exposures of interest, and not on finer aspects of model building and variable selection. Adjusting for measurement error using the regression calibration method evokes several questions concerning valid model construction in the presence of covariates. For instance, are the standard regression calibration adjustments valid when a covariate that is associated only with the measurement error process and not the outcome itself is included in the primary regression model? Does the inclusion of such a variable in the primary regression model induce extraneous variation in the resulting estimator? Clear answers to these questions would provide valuable insight and improve estimation of exposure disease associations measured with error. In the paper, we address these questions analytically and develop extended regression calibration estimators as needed based on assumptions about the underlying association between disease and covariates, for both linear and logistic regression models. The methods are applied to data from the Health Professionals Follow-up Study.
Joint work of Xiaomei Liao, Kate Fitzgerald, and Donna Spiegelman.
|Back to HSPH Biostatistics||
Maintained by the
Last Update: April 29, 2014