PQG Working Group Series


Each year, the PQG organizes a less formal PQG Working Group Series for all local students, postdocs, and faculty. The goal is to provide the opportunity to present and participate in the discussion of works-in-progress, and to focus on the methods and analysis of high-dimensional data in genetics and genomics.

2020/2021 Working Group Organizers: Corbin Quick & Wei Zhou

Please direct any logistical questions to Amanda King

Upcoming Working Group


All PQG working group meetings for the semester will be held by Zoom.  The link to each meeting will be posted along with the talk information.

Tuesday, October 13, 2020
1:00-2:00 PM
Join Zoom meeting: https://harvard.zoom.us/meeting/register/tJcoc-GuqjIuGNfACKpMuLPDyWN4-3aeGaCg

Masahiro Kanai

PhD Candidate
Harvard Medical School, Analytical and Translational Genetics Unit, Mass General Hospital, Broad Institute

Insights into fine-mapping causal variants of complex traits from diverse populations
Identifying causal variants for complex traits is one of the major challenges in human genetics. The causal variants in most GWAS loci remain unknown due to lack of power and to high linkage disequilibrium (LD) in a locus. Moreover, little is known about how causal variants are shared across populations due to lack of large-scale GWAS from diverse populations.

Here, we present a cross-population analysis of fine-mapping results based on three large-scale biobanks. In parallel to our effort fine-mapping complex traits in UK Biobank (n = 361,194; UKB) and expression QTL (eQTL) in GTEx (n = 838) (Ulirsch & Kanai et al), we fine-mapped hundreds of complex traits and diseases from Biobank Japan (n = 180,987; BBJ) and FinnGen (n = 183,694) using FINEMAP (Benner et al, 2016) and SuSiE (Wang et al, 2018). In total, 4,151 high-confidence putative causal variants for 124 traits were identified (posterior inclusion probability [PIP] > 0.9 in any population), including 46 and 66 population-enriched variants from BBJ and FinnGen, respectively. Distinct coding variants from each population often fine-mapped together in the same exons, and we found that coding putative causal variants are more deleterious (OR = 10.4 and 5.9 for pLoF and missense vs synonymous) and more pathogenic (OR = 28.3 for ClinVar variants) than other coding variants. Furthermore, we observed that non-coding putative causal variants are strongly enriched for promoters and cis-regulatory regions (accessible chromatin and H3K27ac) (OR = 10.8 and 11.3 vs non-genic) and colocalized with fine-mapped eQTL variants in GTEx, suggesting that the majority of putative causal variants could be explained via coding or regulatory mechanisms. Altogether, we demonstrate how diverse populations gain additional insights into disease biology with an expanded atlas of candidate causal variants.

Despite high trans-ethnic genetic correlation, we found most single-population fine-mapped variants are undiscoverable across populations; only 8% of the variants with PIP > 0.9 were identified in 95% credible sets in other populations. This inconsistency is mainly due to lack of power in other populations or to LD complexity, with a minority unexplained. For example, among 2,483 fine-mapped variants with PIP > 0.9 in UKB, 53% are missing in BBJ because they are rare or monomorphic, 35% have lower power for association due to MAF and sample size, and 2% have higher LD complexity based on empirical predicted PIP analysis. Overall, our analysis gives insights into how to interpret fine-mapping results from multiple populations and emphasizes the desperate need of more diversity in human genetics.

2020-2021 Dates


September 29, 2020 - Zilin Li, HSPH

Zilin Li

Postdoctoral Fellow
Harvard T. H. Chan School of Public Health

A Framework for Detecting Noncoding Associations in Large Whole Genome Sequencing Studies at Scale

Compared with GWAS and whole exome sequencing studies, large-scale whole genome sequencing studies have enabled the analysis of non-coding rare variants (RVs) associated with complex human traits. Common analytic strategies for RV association in non-coding region considered limited choices of gene-centric masks and sliding windows of a fixed length, and have limited scope to leverage the functions of variants.

We propose a non-coding rare variant association detection framework, including gene-centric analysis and genetic region analysis. For gene-centric analysis, we consider various strategies for grouping non-coding variants based on functional annotations, including UTR, upstream, downstream, promoter, enhancer and long non-coding RNA genes. For genetic region analysis, we group non-coding RVs residing in a contiguous window, defined either by a pre-specified (fixed) window size or a flexible data-adaptive window size using SCANG (SCAN the Genome). The STAAR (variant-Set Test for Association using Annotation infoRmation) method is also applied in the framework that increases the power of RV association tests by effectively incorporating multiple functional annotations.

We applied the proposed framework to analyze non-coding RV association with four quantitative lipid traits (LDL-C, HDL-C, TG and TC) in 21,015 discovery samples and 9,123 replication samples from the NHLBI Trans-Omics for Precision Medicine (TOPMed) program. Several novel non-coding RV-sets associated with lipids were discovered and replicated using the TOPMed WGS data.

October 13, 2020 - Masahiro Kanai, Harvard Medical School<br /> Analytical and Translational Genetics Unit, Mass General Hospital and Broad Institute

Masahiro Kanai

PhD Candidate
Harvard Medical School, Analytical and Translational Genetics Unit, Mass General Hospital, Broad Institute

Insights into fine-mapping causal variants of complex traits from diverse populations
Identifying causal variants for complex traits is one of the major challenges in human genetics. The causal variants in most GWAS loci remain unknown due to lack of power and to high linkage disequilibrium (LD) in a locus. Moreover, little is known about how causal variants are shared across populations due to lack of large-scale GWAS from diverse populations.

Here, we present a cross-population analysis of fine-mapping results based on three large-scale biobanks. In parallel to our effort fine-mapping complex traits in UK Biobank (n = 361,194; UKB) and expression QTL (eQTL) in GTEx (n = 838) (Ulirsch & Kanai et al), we fine-mapped hundreds of complex traits and diseases from Biobank Japan (n = 180,987; BBJ) and FinnGen (n = 183,694) using FINEMAP (Benner et al, 2016) and SuSiE (Wang et al, 2018). In total, 4,151 high-confidence putative causal variants for 124 traits were identified (posterior inclusion probability [PIP] > 0.9 in any population), including 46 and 66 population-enriched variants from BBJ and FinnGen, respectively. Distinct coding variants from each population often fine-mapped together in the same exons, and we found that coding putative causal variants are more deleterious (OR = 10.4 and 5.9 for pLoF and missense vs synonymous) and more pathogenic (OR = 28.3 for ClinVar variants) than other coding variants. Furthermore, we observed that non-coding putative causal variants are strongly enriched for promoters and cis-regulatory regions (accessible chromatin and H3K27ac) (OR = 10.8 and 11.3 vs non-genic) and colocalized with fine-mapped eQTL variants in GTEx, suggesting that the majority of putative causal variants could be explained via coding or regulatory mechanisms. Altogether, we demonstrate how diverse populations gain additional insights into disease biology with an expanded atlas of candidate causal variants.

Despite high trans-ethnic genetic correlation, we found most single-population fine-mapped variants are undiscoverable across populations; only 8% of the variants with PIP > 0.9 were identified in 95% credible sets in other populations. This inconsistency is mainly due to lack of power in other populations or to LD complexity, with a minority unexplained. For example, among 2,483 fine-mapped variants with PIP > 0.9 in UKB, 53% are missing in BBJ because they are rare or monomorphic, 35% have lower power for association due to MAF and sample size, and 2% have higher LD complexity based on empirical predicted PIP analysis. Overall, our analysis gives insights into how to interpret fine-mapping results from multiple populations and emphasizes the desperate need of more diversity in human genetics.

November 10, 2020 - Carles Boix, MIT

Carles Boix

PhD Candidate
MIT

December 15, 2020 - Xihao Li, HSPH

Xihao Li

PhD Candidate
Harvard T. H. Chan School of Public Health

February 16, 2021 -

 

March 10, 2021 -

April 6, 2021-

 

May 11, 2021-

 

PQG Working Group Archive