PQG Working Group Series

Each year, the PQG organizes a less formal PQG Working Group Series for all local students, postdocs, and faculty. The goal is to provide the opportunity to present and participate in the discussion of works-in-progress, and to focus on the methods and analysis of high-dimensional data in genetics and genomics.

2020/2021 Working Group Organizers: Corbin Quick & Wei Zhou

Please direct any logistical questions to Amanda King

Upcoming Working Group

All PQG working group meetings for the semester will be held by Zoom.  The link to each meeting will be posted along with the talk information.

Tuesday, February 16, 2020
1:00-2:00 PM
Join Zoom meeting:

Xuefang Zhao 

Postdoctoral Researcher
Harvard Medical School, Center for Genomic Medicine at Massachusetts General Hospital

2020-2021 Dates

September 29, 2020 - Zilin Li, HSPH

Zilin Li

Postdoctoral Fellow
Harvard T. H. Chan School of Public Health

A Framework for Detecting Noncoding Associations in Large Whole Genome Sequencing Studies at Scale

Compared with GWAS and whole exome sequencing studies, large-scale whole genome sequencing studies have enabled the analysis of non-coding rare variants (RVs) associated with complex human traits. Common analytic strategies for RV association in non-coding region considered limited choices of gene-centric masks and sliding windows of a fixed length, and have limited scope to leverage the functions of variants.

We propose a non-coding rare variant association detection framework, including gene-centric analysis and genetic region analysis. For gene-centric analysis, we consider various strategies for grouping non-coding variants based on functional annotations, including UTR, upstream, downstream, promoter, enhancer and long non-coding RNA genes. For genetic region analysis, we group non-coding RVs residing in a contiguous window, defined either by a pre-specified (fixed) window size or a flexible data-adaptive window size using SCANG (SCAN the Genome). The STAAR (variant-Set Test for Association using Annotation infoRmation) method is also applied in the framework that increases the power of RV association tests by effectively incorporating multiple functional annotations.

We applied the proposed framework to analyze non-coding RV association with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 21,015 discovery samples and 9,123 replication samples from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Several novel non-coding RV-sets associated with lipids were discovered and replicated using the TOPMed WGS data.

October 13, 2020 - Masahiro Kanai, Harvard Medical School<br /> Analytical and Translational Genetics Unit, Mass General Hospital and Broad Institute

Masahiro Kanai

PhD Candidate
Harvard Medical School, Analytical and Translational Genetics Unit, Mass General Hospital, Broad Institute

Insights into fine-mapping causal variants of complex traits from diverse populations
Identifying causal variants for complex traits is one of the major challenges in human genetics. The causal variants in most GWAS loci remain unknown due to lack of power and to high linkage disequilibrium (LD) in a locus. Moreover, little is known about how causal variants are shared across populations due to lack of large-scale GWAS from diverse populations.

Here, we present a cross-population analysis of fine-mapping results based on three large-scale biobanks. In parallel to our effort fine-mapping complex traits in UK Biobank (n = 361,194; UKB) and expression QTL (eQTL) in GTEx (n = 838) (Ulirsch & Kanai et al), we fine-mapped hundreds of complex traits and diseases from Biobank Japan (n = 180,987; BBJ) and FinnGen (n = 183,694) using FINEMAP (Benner et al, 2016) and SuSiE (Wang et al, 2018). In total, 4,151 high-confidence putative causal variants for 124 traits were identified (posterior inclusion probability [PIP] > 0.9 in any population), including 46 and 66 population-enriched variants from BBJ and FinnGen, respectively. Distinct coding variants from each population often fine-mapped together in the same exons, and we found that coding putative causal variants are more deleterious (OR = 10.4 and 5.9 for pLoF and missense vs synonymous) and more pathogenic (OR = 28.3 for ClinVar variants) than other coding variants. Furthermore, we observed that non-coding putative causal variants are strongly enriched for promoters and cis-regulatory regions (accessible chromatin and H3K27ac) (OR = 10.8 and 11.3 vs non-genic) and colocalized with fine-mapped eQTL variants in GTEx, suggesting that the majority of putative causal variants could be explained via coding or regulatory mechanisms. Altogether, we demonstrate how diverse populations gain additional insights into disease biology with an expanded atlas of candidate causal variants.

Despite high trans-ethnic genetic correlation, we found most single-population fine-mapped variants are undiscoverable across populations; only 8% of the variants with PIP > 0.9 were identified in 95% credible sets in other populations. This inconsistency is mainly due to lack of power in other populations or to LD complexity, with a minority unexplained. For example, among 2,483 fine-mapped variants with PIP > 0.9 in UKB, 53% are missing in BBJ because they are rare or monomorphic, 35% have lower power for association due to MAF and sample size, and 2% have higher LD complexity based on empirical predicted PIP analysis. Overall, our analysis gives insights into how to interpret fine-mapping results from multiple populations and emphasizes the desperate need of more diversity in human genetics.

November 10, 2020 - Carles Boix, MIT

Carles Boix

PhD Candidate

Regulatory genomic circuitry of human disease loci by integrative epigenomics
Annotating the molecular basis of human disease remains an unsolved challenge, as 93% of disease loci are non-coding, and gene-regulatory annotations highly incomplete. Here, we present EpiMap, a compendium of 10,000 epigenomic maps across 800 samples, which we use to define chromatin states, high-resolution enhancers, enhancer modules, upstream regulators, and downstream target genes. We use this resource to annotate 30,000 genetic loci associated with 540 traits, predicting trait-relevant tissues, putative causal nucleotide variants in enriched-tissue enhancers, and candidate tissue-specific target genes for each. We partition multifactorial traits into tissue-specific contributing factors with distinct functional enrichments and disease-comorbidity patterns, and reveal both single-factor monotropic and multi-factor pleiotropic loci. Top-scoring loci frequently have multiple predicted driver variants, converging through multiple common-target enhancers, multiple common-tissue genes, or multiple genes/tissues, indicating extensive pleiotropy. Our results demonstrate the importance of dense, rich, high-resolution epigenomic annotations for complex trait dissection.

December 15, 2020 - Xihao Li, HSPH

Xihao Li

PhD Candidate
Harvard T. H. Chan School of Public Health

Powerful and resource-efficient rare variant meta-analysis for large-scale whole genome sequencing studies using summary statistics and functional annotations, with application to TOPMed lipid data
Large-scale whole genome sequencing (WGS) studies have enabled the analysis of rare variants (RVs) associated with complex human traits. Existing RV meta-analysis approaches are not scalable when applied to WGS data. We propose MetaSTAAR (Meta-analysis of variant-Set Test for Association using Annotation infoRmation), a powerful and resource-efficient rare variant meta-analysis framework, for large-scale whole genome sequencing association studies. MetaSTAAR accounts for population structure and relatedness for both continuous and dichotomous traits by fitting the generalized linear mixed models using sparse genetic relatedness matrices. By storing LD information of RVs in sparse matrix format, the proposed workflow is highly storage efficient and computationally scalable for analyzing large-scale WGS data. Furthermore, the proposed meta-analysis framework builds upon the STAAR method, which dynamically incorporates multiple functional annotations to empower rare variant association analysis and allows for RV-set analysis including gene-centric analysis by grouping variants into functional categories for each gene and genetic region analysis using sliding windows. MetaSTAAR also enables conditional analyses to identify RV-set signals independent of nearby common variants. We applied MetaSTAAR to identify RV-sets associated with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 30,138 related samples from the NHLBI Trans-Omics for Precision Medicine program Freeze 5 data, consisting of 14 ancestrally diverse study cohorts and 255 million variants in total. MetaSTAAR requires 520 GB to store the summary statistics and LD matrices across the whole genome, which is at least 100 times smaller than the existing method RAREMETAL. In addition, the computation time is benchmarked to be 100 times faster than RAREMETAL. Compared to the joint analysis of pooled individual-level data using STAAR, the P-values from MetaSTAAR and STAAR are highly consistent, with correlation > 0.99 among significant regions in both unconditional and conditional analyses.

February 16, 2021 - Xuefang Zhao, HMS / MGH

Xuefang Zhao 

Postdoctoral Researcher
Harvard Medical School, Center for Genomic Medicine at Massachusetts General Hospital

March 9, 2021 - Kumar Veerapen, HMS, ATGU, MGH and Broad Institute

Postdoctoral Researcher
Harvard Medical School, Analytical and Translational Genetics Unit, Massachusetts General Hospital and Broad Institute

April 6, 2021-


May 11, 2021-


PQG Working Group Archive