PQG Working Group Series

Each year, the PQG organizes a less formal PQG Working Group Series for all local students, postdocs, and faculty. The goal is to provide the opportunity to present and participate in the discussion of works-in-progress, and to focus on the methods and analysis of high-dimensional data in genetics and genomics.

2019 Working Group Organizers: Robert Maier & Rounak Dey

Please direct any logistical questions to Amanda King

Upcoming Working Group


All remaining PQG working group meetings for the semester will be held by Zoom.  The link to each meeting will be posted along with the talk information.

Tuesday, May 12, 2020
1:00-2:00 PM
Join Zoom meeting:
https://harvard.zoom.us/meeting/register/tJUvd–uqTkoGdO75QX5VmEczCpTMLeoLzA0

Tiffany Amariuta

PhD Candidate
Harvard Medical School

Leveraging functional annotations to improve the trans-ethnic portability of polygenic risk scores

Polygenic risk scores (PRS) quantify the cumulative trait-increasing allelic effect of an individual’s genome and have the potential to transform clinical practice by identifying patients with higher genetic burdens. Poor trans-ethnic portability of PRS is a critical issue that may be partially due to limited knowledge of causal variants shared among populations. Hence, leveraging noncoding regulatory annotations that capture genetic variation across populations has the potential to enhance the trans-ethnic portability of PRS. Using our cell-type-specific regulatory element annotation strategy called IMPACT, we partitioned the common SNP heritability of diverse polygenic traits and diseases from 111 GWAS summary statistics of European (EUR, average N=180K) and East Asian (EAS, average N=157K) origin. Strikingly, we observed highly concordant polygenic trait regulation between populations: the same regulatory annotations captured statistically indistinguishable SNP heritability. Therefore, prioritizing variants in IMPACT regulatory elements may improve the trans-ethnic portability of PRS by selecting variants with population nonspecific effects. Indeed, we observed that EUR PRS models more accurately predicted 21 tested phenotypes of EAS individuals when variants were prioritized by key IMPACT tracks (49.9% mean relative increase in R ). Notably, the improvement afforded by IMPACT 2 was greater in the trans-ethnic EUR-to-EAS PRS application than in the EAS-to-EAS application (47.3% vs 20.9%, P < 1.7e-4). Overall, our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ethnic portability of genetic data, and this has important implications for future risk prediction models that work across populations.

2019-2020 Dates


September 24, 2019 - Raymond Walters, MGH, Broad

Raymond Walters

Post-Doctoral Research Fellow
Analytical and Translational Genetics Unit
Mass General Hospital and Broad Institute

Exploring sex differences in the genetics of UK Biobank phenotype

Sex differences are commonly observed for numerous human traits (e.g. sexual dimorphism of height and weight, sex bias in psychiatric diagnoses), likely reflecting both biological (e.g. sex hormones) and environmental (e.g. cultural norms) effects. Twin studies and genome-wide association studies (GWAS), have suggested that genetic effects are predominantly shared across sexes. Nevertheless, there is evidence that for certain phenotypes, such as disorders involving alcohol use and waist-hip ratio, genetic effects may differ by sex. Detecting these differences remains challenging, however, since the sex differences in genetic effects are likely smaller than the genetic main effects. We use the large-scale genotyping data (194,174 female and 167,020 male unrelated individuals of European ancestry) of UK Biobank (UKB) to evaluate whether sex differences in genetic effects can be detected across the breadth of UKB phenotypes. In preliminary results, we find that on average most autosomal genetic effects are shared between sexes (mean rg=.92), but significant differences are observed for reported substance use, body mass index, and insomnia, among other phenotypes. We compare these estimates to analyses of siblings in UKB, and identify individual loci showing significant differences in effects between the GWAS in each sex.

October 1, 2019 - Julian Hecker, Brigham and Women's Hospital

Julian Hecker

Research Associate
Brigham and Women’s Hospital

PolyGEE: A generalized estimating equation approach to the efficient estimation of polygenic effects in large-scale association studies

To quantify polygenic effects in large-scale association studies, we propose a generalized estimating equation (GEE) based estimation framework. We develop a marginal model for single-variant association test statistics of complex diseases that is applicable to population-based designs, to family-based designs or arbitrary combinations of both. We extend the standard GEE approach so that the parameters of the proposed marginal model can be estimated based on working-correlation/linkage-disequilibrium (LD) matrices from external reference panels. Our method achieves substantial efficiency gains over standard approaches, while it is robust against misspecification of the LD structure, i.e. the LD structure of the reference panel can differ substantially from the true LD structure in the study population. In simulation studies and applications to real data, we illustrate the features of the proposed GEE framework.

November 19, 2019 - Tarjinder Singh, MGH, HMS

Tarjinder Singh

Postdoctoral fellow
Massachusetts General Hospital, Harvard Medical School

Exome sequencing identifies genes for schizophrenia

Ultra-rare variants from 25,000 schizophrenia cases implicates ten genes and provides mechanistic hypotheses related to the biology of disease

The Schizophrenia Exome Sequencing Meta­-Analysis (SCHEMA) Consortium is one of the largest efforts to analyze sequencing data to advance gene discovery. We have completed the analysis of 24,248 sequenced cases and 97,322 controls comprising of individuals from five continental populations. The scale of SCHEMA enables us, for the first time, to implicate URVs in ten genes as conferring substantial risk for SCZ at genome-­wide significance (odds ratios 4 ­ 50, P < 2e­6), and 34 genes at a FDR < 5%. Two of these, the NMDA receptor subunit GRIN2A and transcription factor SP4, reside in two loci implicated by SCZ GWAS. A second glutamate receptor subunit, GRIA3, is also implicated, providing support for the hypofunction of the glutamatergic system in the pathogenesis of schizophrenia.

Exploring the published results from severe neurodevelopmental delay (NDD) and autism (ASD) consortia, we find that top ten SCZ genes have no protein­-truncating variant signal in either ascertainment, though a significant overlap between ASD (n = 102, FDR < 10%) and SCZ risk genes (n = 34, FDR < 5%) was observed. Partitioning the 102 ASD genes into those disrupted more frequently 1) in ASD and 2) in intellectual disability (ID), we show that this signal was driven by the ASD­preferential, and not ID­ preferential genes. Thus, SCZ genes from exome sequencing have relevance for later-­onset psychiatric disorders rather than more severe NDDs.

Between genes identified by GWAS and exome sequencing, we find convergence in biological processes and tissue types, specifically in synaptic transmission and components of the post­synaptic density. After excluding associated genes, SCZ cases still carry a substantial excess of rare URVs, suggesting that many more remain to be discovered.

December 10, 2019 - Cancelled

Duncan Palmer

Post-Doctoral Fellow
Analytical and Translational Genetics Unit
Mass General Hospital and Broad Institute

Dominance genetic effects across multiple traits

 

February 25, 2020 - Caitlin Carey, Analytical and Translational Genetics Unit, Mass General Hospital and Broad Institute

Caitlin Carey

Post-Doctoral Fellow
Analytical and Translational Genetics Unit, Mass General Hospital and Broad Institute

Genetic architecture of phenome-wide latent factors in the UK Biobank

March 10, 2020 - Canceled

Senior Computational Biologist
Harvard Medical School
Chromosomal phasing improves aneuploidy determination in non-invasive prenatal testing at low fetal fractions
Non-invasive prenatal testing (NIPT) to detect fetal aneuploidy by sequencing cell-free DNA (cfDNA) in maternal plasma has been broadly adopted. To detect fetal aneuploidies from maternal plasma – where fetal DNA is mixed with far-larger amounts of maternal DNA – NIPT requires a minimum fraction of the circulating cfDNA to be of placental origin, a level which is usually attained beginning at 10 weeks gestational age. We present a framework to leverage chromosomal phase – the arrangement of alleles along homologous chromosomes – to make NIPT analyses more conclusive. We re-analyze data from a singleton pregnant mother who received an inconclusive aneuploidy determination through NIPT due to a fetal DNA fraction of 3.4%. We show that the same laboratory data can be used to conclusively infer the presence of a trisomy 18 fetus when chromosomal phase is taken into account. Key to the effectiveness of the approach we describe is the robustness of allelic fraction estimates to biological and laboratory-process driven noise and the ability of chromosomal phase to integrate quantitative signals across very many polymorphic markers. These results show that chromosomal phase increases the sensitivity of a common laboratory test, an idea that could also have broad application in cfDNA analyses for cancer detection.

April 7, 2020- Hufeng Zhou, HMS

Hufeng Zhou

Instructor in Medicine
Harvard Medical School

April 21, 2020- Wei Zhou, Broad Institute

Wei Zhou

Postdoctoral Research Fellow
Broad Institute

An efficient and accurate frailty model approach for genome-wide survival association analysis controlling for population structure and relatedness in large-scale biobanks

With decades of electronic health records linked to genomic data, large biobanks provide unprecedented opportunities for systematically understanding the genetics of the natural history of complex diseases. Genome-wide survival association analysis can identify genetic variants associated with ages of onset, disease progression and lifespan.

We developed an efficient and accurate frailty (random effects) model approach for genome-wide survival association analysis of censored time-to-event phenotypes in large datasets (>400,000 individuals), which accounts for population structure and relatedness. Our method utilizes state-of-the-art optimization strategies to reduce the computation cost. The saddlepoint approximation is used to allow for analysis of heavily censored endpoints and low frequency variants. We have demonstrated analysis of ~400,000 samples and 77 million genetic markers in 15 hours using 32 threads and < 15 GByte of memory – with computing time scaling nearly linearly with the numbers of samples and markers. Using simulation studies, we have showed that type I error rates are well controlled, even for less frequent variants with frequency 0.5% and censoring rate 99%. We analyzed ~180,000 samples in FinnGen and ~400,000 UK Biobank participants with white British ancestry and identified the well-known locus APOE-e4 associated with lifespan. We also identified loci for secondary conditions following primary diseases, such as death following lung cancer and nephropathy following diabetes. Currently, we are applying our method to more event-to-time endpoints in the UK Biobank.

Genome-wide survival analysis of large biobanks will help understand the natural history and progression of diseases and pave a road for constructing polygenic risk scores to predict age-at-onset, disease progression and lifespan.

May 12, 2020- Tiffany Amariuta, HMS

Tiffany Amariuta

PhD Candidate
Harvard Medical School

Leveraging functional annotations to improve the trans-ethnic portability of polygenic risk scores

Polygenic risk scores (PRS) quantify the cumulative trait-increasing allelic effect of an individual’s genome and have the potential to transform clinical practice by identifying patients with higher genetic burdens. Poor trans-ethnic portability of PRS is a critical issue that may be partially due to limited knowledge of causal variants shared among populations. Hence, leveraging noncoding regulatory annotations that capture genetic variation across populations has the potential to enhance the trans-ethnic portability of PRS. Using our cell-type-specific regulatory element annotation strategy called IMPACT, we partitioned the common SNP heritability of diverse polygenic traits and diseases from 111 GWAS summary statistics of European (EUR, average N=180K) and East Asian (EAS, average N=157K) origin. Strikingly, we observed highly concordant polygenic trait regulation between populations: the same regulatory annotations captured statistically indistinguishable SNP heritability. Therefore, prioritizing variants in IMPACT regulatory elements may improve the trans-ethnic portability of PRS by selecting variants with population nonspecific effects. Indeed, we observed that EUR PRS models more accurately predicted 21 tested phenotypes of EAS individuals when variants were prioritized by key IMPACT tracks (49.9% mean relative increase in R ). Notably, the improvement afforded by IMPACT 2 was greater in the trans-ethnic EUR-to-EAS PRS application than in the EAS-to-EAS application (47.3% vs 20.9%, P < 1.7e-4). Overall, our study identifies a crucial role for functional annotations such as IMPACT to improve the trans-ethnic portability of genetic data, and this has important implications for future risk prediction models that work across populations.

PQG Working Group Archive