Loading Events
  • This event has passed.

PQG Working Group

December 12, 2020 @ 12:00 am - 1:00 am

Xihao Li

PhD Candidate
Harvard T. H. Chan School of Public Health

Powerful and resource-efficient rare variant meta-analysis for large-scale whole genome sequencing studies using summary statistics and functional annotations, with application to TOPMed lipid data

Large-scale whole genome sequencing (WGS) studies have enabled the analysis of rare variants (RVs) associated with complex human traits. Existing RV meta-analysis approaches are not scalable when applied to WGS data. We propose MetaSTAAR (Meta-analysis of variant-Set Test for Association using Annotation infoRmation), a powerful and resource-efficient rare variant meta-analysis framework, for large-scale whole genome sequencing association studies. MetaSTAAR accounts for population structure and relatedness for both continuous and dichotomous traits by fitting the generalized linear mixed models using sparse genetic relatedness matrices. By storing LD information of RVs in sparse matrix format, the proposed workflow is highly storage efficient and computationally scalable for analyzing large-scale WGS data. Furthermore, the proposed meta-analysis framework builds upon the STAAR method, which dynamically incorporates multiple functional annotations to empower rare variant association analysis and allows for RV-set analysis including gene-centric analysis by grouping variants into functional categories for each gene and genetic region analysis using sliding windows. MetaSTAAR also enables conditional analyses to identify RV-set signals independent of nearby common variants. We applied MetaSTAAR to identify RV-sets associated with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 30,138 related samples from the NHLBI Trans-Omics for Precision Medicine program Freeze 5 data, consisting of 14 ancestrally diverse study cohorts and 255 million variants in total. MetaSTAAR requires 520 GB to store the summary statistics and LD matrices across the whole genome, which is at least 100 times smaller than the existing method RAREMETAL. In addition, the computation time is benchmarked to be 100 times faster than RAREMETAL. Compared to the joint analysis of pooled individual-level data using STAAR, the P-values from MetaSTAAR and STAAR are highly consistent, with correlation > 0.99 among significant regions in both unconditional and conditional analyses.

Details

Date: December 12, 2020
Time: 12:00 am - 1:00 am
Calendars: General Event

Venue