DFCI Seminar with Jinbo Chen – 2/26

Monday, February 26, 2:30-3:30pm, CLSB 11081B

Novel Two-Phase Sampling Designs for Studying Binary Outcomes

Jinbo Chen
Department of Biostatistics, Epidemiology and Informatics
University of Pennsylvania

In a biomedical cohort study for assessing the association between an outcome variable and a set of covariates, it is common that a subset of covariates can only be measured on a subgroup of study subjects. An important design question is which subjects to select into the subgroup towards increased statistical efficiency for estimating association parameters. When the outcome is binary, one may adopt a case-control sampling design or a balanced case-control design where cases and controls are further matched on a small number of complete discrete covariates. While the latter achieves success in estimating odds ratio (OR) parameters for the matching covariates, to our best knowledge, to date, similar two-phase design options have not been explored to increase statistical efficiency for assessing the remaining covariates, especially the incompletely collected ones. This is of great importance in studies where the covariates of interest cannot be completely collected. To this end, assuming that an external model is available relating the outcome and complete covariates, we propose a novel sampling scheme that oversamples cases and controls with worse goodness-of-fit based on the external model and then further matches them on complete covariates similarly to the balanced design. We developed a pseudo-likelihood method for estimating OR parameters, which can be performed using existing software package. Through extensive simulation studies and explorations in a real cohort study setting, we found that our design generally led to reduced asymptotic variances of the OR estimates to a similar extent for both the incomplete and complete covariates, and the reduction for the matching covariates was comparable to that of the balanced design.