Tianxi Cai

Tianxi Cai, ScD
John Rock Professor of Population and Translational Data Sciences
Harvard T.H. Chan School of Public Health
Professor of Biomedical Informatics, Harvard Medical School

 

Enabling Imprecise EHR Data for Precision Medicine

While traditional cohort studies and clinical trials remain critical sources for studying disease risk, progression and treatment response, they have limitations including the generalizability of the study findings to the real world and the limited ability to test broader hypotheses. In recent years, large electronic health records (HER) data integrated with biological data now exist as a new source for precision medicine research. These datasets open new opportunities for deriving real-word, data-driven prediction models of disease risk and progression. Yet, they also bring methodological challenges. For example, obtaining precise clinical event onset time, is a major bottleneck in EHR research, as it requires laborious medical record review and such information may not be accurately documented. In this talk, I’ll discuss statistical methods for developing risk prediction models that can efficiently leverage both a small partially labeled dataset and a large unlabeled data. These methods will be illustrated using EHR data from Partner’s Healthcare.