Department of Biostatistics
Statistical Methods in Epidemiology Working Group
2013 - 2014
ABSTRACT: A marker of validity of a dietary instrument is how well it correlates with a relevant biomarker. For example, there are different methods of assessing dietary Beta carotene (e.g., 24-hour recall, food frequency questionnaire, diet record) and a natural question is which measure correlates most strongly with plasma Beta carotene. Indeed, there already exist methods for comparing dependent correlation coefficients (e.g., Wolfe, 1976; Steiger, 1980; Meng, 1992). However, each of these instruments has associated random error and a related question is after correcting for random error, which instrument correlates most highly with plasma Beta carotene. Since these correlations are assessed from the same subjects, the more general question is how can we compare dependent deattenuated correlation coefficients. This is a generalization of previous work for obtaining confidence limits for a single deattenuated correlation coefficient (Rosner and Willett, 1988). In addition, we extend this work to the comparison of dependent Spearman correlation coefficients, which to our knowledge has never been done before. These methods are illustrated with two examples, (a) the comparison of the validity of different methods for measuring dietary Beta carotene and (b) the comparison of the validity of different storage protocols in processing plasma samples for the determination of HgbAlc.
ABSTRACT: In recent years, a number of large-scale genome-wide association studies have been published for human traits adjusted for other correlated traits with a genetic basis. The motivation for such an adjustment is to discover genetic variants associated with the primary outcome independently of the correlated trait. In this work, we contend that this objective is almost fulfilled for variants that have either similar effects (magnitude and direction) on both the 'primary' and the 'correlated' trait, or effects on the primary outcome only. For all other variants, an unintended bias is introduced with respect to the primary outcome as a result of the adjustment. We identify published genome-wide association studies, including a large meta-analysis of waist-to-hip ratio, where genetic effects are incorrectly interpreted because of such adjustment. Using both theory and simulations, we explore this phenomenon in detail and discuss the ramifications for future genome-wide association studies of correlated traits and diseases.
ABSTRACT: In nonrandomized studies of comparative effectiveness of medications, utilizing the variation in treatment patterns across prescribers may improve the validity of treatment effect estimates, yet the majority of analyses ignore the prescriber. Via simulations and example studies, we evaluated approaches to utilizing the prescriber in analysis, as well as diagnostic analyses for choosing among these approaches. Including a prescriber random intercept in the propensity score had unpredictable results and sometimes increased bias over the default analysis that ignores the prescriber. Instrumental variable approaches were unbiased when instrument assumptions were met, which required no clustering of patient risk factors within prescriber, a scenario that may be unlikely in practice. Stratification on prescriber often reduced bias from unmeasured patient and prescriber characteristics. Better diagnostics are needed to guide the analytic strategy for comparative effectiveness data that contain prescriber information.
ABSTRACT: In this talk we study aspects of discrimination in risk prediction models with a binary outcome and continuous predictors. The Area Under the Receiver Operating Characteristics Curve, (AUC) is a widely used measure of discrimination in risk prediction models. Numerous studies report the failure of an added significant predictor to produce a significant improvement in the AUC when tested by the DeLong test. We use the theory of U-statistics to explain this contradiction. First we note that DeLong test was developed for non-nested models. Then we show that for two nested models the difference of their AUCs belongs to a degenerate class of U-statistics and therefore has a different asymptotic distribution than the one used by the DeLong test, resulting in substantial (up to 60%) loss of power. Possible solutions are discussed.
It is often assumed that a good new predictor should be uncorrelated with variables already in the model. Assuming multivariate normality of predictors and using some of the theoretical results above, we show that correlation among predictors can be beneficial for discrimination. First, we show that if the effect size is positive, then negative correlation between a new variable and the old risk score is always beneficial for discrimination, whereas zero correlation is often detrimental. Second, we show that a new predictor that regresses well on old predictors (has high multiple R-squared) improves the quality of discrimination. This result holds rigorously for linear discriminant analysis and asymptotically for logistic regression.
|Back to HSPH Biostatistics||
Maintained by the
Last Update: March 24, 2014