Seunggeun Lee

Seunggeun Lee, PhD
Associate Professor of Biostatistics
University of Michigan School of Public Health

 

Scalable and accurate association analysis of Biobank data

Large-scale biobanks have emerged as a powerful resource for complex disease studies and precision medicine. The detailed genomic information coupled with clinical, behavior and environmental measurements enables to discover novel genetic associations and disease mechanism across the entire phenome. However, the scale and complex structure of biobank data have remained as substantial challenges. In this talk, I will first introduce a new statistical method that can analyze 500,000 samples for binary phenotypes with adjusting for family relatedness and case-control imbalance.  This new method, called SAIGE, uses the saddle point approximation to adjust for case-control imbalance at the top of the recently developed Generalized Linear Mixed Model method. In addition, it uses state of art optimization techniques, for example, preconditioned conjugate gradient for solving large-scale linear systems, to analyze large sample data. I will also introduce our more recent efforts including rare variant tests for biobank scale data.