PQG Working Group Series

Each year, the PQG organizes a less formal PQG Working Group Series for all local students, postdocs, and faculty. The goal is to provide the opportunity to present and participate in the discussion of works-in-progress, and to focus on the methods and analysis of high-dimensional data in genetics and genomics.

2021/2022 Working Group Organizers: Karthik Jagadeesh & Eric Van Buren

Please direct any logistical questions to Amanda King

Upcoming Working Group


All PQG working group meetings for the semester will be held by Zoom.  The link to each meeting will be posted along with the talk information.

Tuesday, November 9, 2021
1:00-2:00 PM
Join Zoom meeting:
https://harvard.zoom.us/meeting/register/tJwrdumupzMvEt2LyQV2AzCvdkaPQ0NsOdXj

Tushar Kamath

PhD Candidate, Biophysics, Broad Institute
MD Student, Harvard Medical School

Vulnerabilities of midbrain dopaminergic neurons to Parkinson’s disease revealed by single-cell genomics

The loss of some dopamine (DA) neurons within the substantia nigra pars compacta (SNpc) is a defining pathological hallmark of Parkinson’s Disease (PD). Yet, the molecular features associated with DA neuron vulnerability have not yet been fully identified. To comprehensively characterize DA neuron types in the SNpc and their relative vulnerabilities to PD, we developed a protocol to enrich and transcriptionally profile thousands of midbrain DA neurons from PD patients and matched controls. We identified 10 populations and spatially localized each within the SNpc using Slide-seq, a high-resolution spatial transcriptomics technology. A single subtype, marked by the expression of the gene AGTR1 and spatially confined to the ventral tier of SNpc, was highly susceptible to loss and showed the strongest upregulation, relative to other DA types, of targets of TP53 and NR2F2, nominating molecular processes associated with degeneration in vivo. This same vulnerable population was specifically enriched for the heritable risk associated with sporadic PD. These analyses highlight the importance of cell-intrinsic pathways in determining the differential vulnerability of DA neurons to degeneration in PD.

2020-2021 Dates


September 28, 2021 - Rounak Dey, HSPH

Rounak Dey

Postdoctoral Research Fellow, Biostatistics
Harvard T.H. Chan School of Public Health

Scalable and accurate mixed effects model to account for relatedness and populations structure in multi-ethnic PheWAS using sparse ancestry-adjusted genetic relatedness matrix

In genetic association studies, generalized linear mixed effects models (GLMMs) are commonly used to control for the relatedness among the samples by modelling the familial and cryptic relationships using a genetic relationship matrix (GRM). Existing GLMM methods use an empirical GRM to account for the sample-relatedness, which works well in the context of association analysis in studies with mostly homogeneous populations. However, they are not suitable to analyze recent multi-ethnic whole genome sequencing (WGS) studies with heterogeneous populations. A standard approach to adjust for the population stratification in multi-ethnic studies is to use the principal components (PCs) as fixed effects. Since the empirical GRM also contains the population structure information, using both the PCs and the empirical GRM in the model can lead to “double-fitting” of the population structure. Moreover, using the empirical GRM can lead to the mis-specification of the familial relationships due to the confounding effect of the population structure, which can potentially result in the loss of power and miscalibration of the type I error rates. Existing methods that rely on the sparsity of the empirical GRM also fail to work because of the lack of sparsity due to the population structure.
Here, we propose a scalable GLMM method for multi-ethnic studies that uses a sparse ancestry-adjusted GRM to model the sample-relatedness, and accounts for the population structure using the ancestry-informative principal components as fixed effects. By separating the distant ancestry and the familial relationships, our method provides a scalable and accurate solution to analyze large multi-ethnic studies, especially some of the recent WGS studies, which leads to accurate type I error control and improved power to detect associations. To facilitate the entire pipeline for the WGS data analysis, we further propose a scalable computation method to estimate the sparse ancestry-adjusted GRM using efficient distributed computation techniques, which can compute the sparse ancestry-adjusted GRM for the entire UK Biobank dataset of more than 450000 subjects in less than nine hours using only 45 CPUs and 40 GB overall memory usage. Using numerical simulations, and an application on the entire UK Biobank dataset, we demonstrate that our method is scalable to handle association analysis with more than 450000 subjects, and control type I error and improve power compared to the existing GLMM methods.

October 12, 2021 - Tiffany Amariuta, HSPH

Tiffany Amariuta

Postdoctoral Research Fellow, Genetic Epidemiology and Statistical Genetics
Harvard T.H. Chan School of Public Health

Modeling tissue co-regulation to infer tissue-specific contributions to disease

Despite abundant evidence of disease etiologies that span multiple tissues, quantifying tissue- specific contributions to disease heritability remains challenging. Previous work emphasized the potential of accounting for tissue co-regulation (Ongen et al. 2017 Nat Genet), but tissue-specific disease effects have not been formally modeled.

We introduce a new method, tissue co-regulation score regression (TCSC), that quantifies tissue-specific contributions to disease heritability by regressing transcriptome-wide association study (TWAS) gene-disease chi-square statistics on tissue co-regulation scores, across genes and tissues. TWAS statistics include direct effects of predicted cis-genetic components of gene expression on disease and tagging effects of co-regulated tissues (Wainberg et al. 2019 Nat Genet). TCSC distinguishes best proxy causal versus tagging gene-disease effects across tissues by modeling pairwise correlations of predicted gene expression between tissues (tissue co-regulation scores). In simulations, TCSC detects causal tissues with well-calibrated false positive rate across a broad range of parameter settings. At default settings, TCSC attained substantially higher power to detect causal tissues than the Ongen et al. method. TCSC also estimates the proportion of SNP-heritability explained by each tissue; estimates are conservative, as they exclude effects of genes with non-significant gene expression heritability at finite sample size.

We applied TCSC to 82 heritable complex traits and diseases from UK Biobank (average N = 299K), using gene expression prediction models constructed from GTEx data across 49 tissues to compute TWAS statistics and co-regulation scores. Below, we discuss tissues with non-zero heritability at 10% FDR for three representative traits: waist-hip-ratio (WHR), anorexia, and height. For WHR, TCSC correctly implicates adipose (29.9% of SNP-heritability explained), as WHR reflects the corporeal distribution of fat. For anorexia, TCSC specifically implicates the brain cortex (32.9% of SNP-heritability explained) out of five central nervous system regions, consistent with extensive previous work implicating this brain region (Haye et al. 2009 Nat Rev Neuro). For height, TCSC implicates five best proxy causal tissues, where the lead tissue, fibroblasts, explains 41.8% of SNP-heritability. This is consistent with the known role of connective tissue in growth regulation. Our method also reduced the number of trait-associated tissues within a tissue category by 67% compared to LDSC-SEG (Finucane et al. 2018 Nat Genet). In conclusion, TCSC is a powerful method for quantifying tissue-specific contributions to disease heritability.

November 9, 2021 - Tushar Kamath, Broad Institute

Tushar Kamath

PhD Candidate, Biophysics, Broad Institute
MD Student, Harvard Medical School

Vulnerabilities of midbrain dopaminergic neurons to Parkinson’s disease revealed by single-cell genomics

The loss of some dopamine (DA) neurons within the substantia nigra pars compacta (SNpc) is a defining pathological hallmark of Parkinson’s Disease (PD). Yet, the molecular features associated with DA neuron vulnerability have not yet been fully identified. To comprehensively characterize DA neuron types in the SNpc and their relative vulnerabilities to PD, we developed a protocol to enrich and transcriptionally profile thousands of midbrain DA neurons from PD patients and matched controls. We identified 10 populations and spatially localized each within the SNpc using Slide-seq, a high-resolution spatial transcriptomics technology. A single subtype, marked by the expression of the gene AGTR1 and spatially confined to the ventral tier of SNpc, was highly susceptible to loss and showed the strongest upregulation, relative to other DA types, of targets of TP53 and NR2F2, nominating molecular processes associated with degeneration in vivo. This same vulnerable population was specifically enriched for the heritable risk associated with sporadic PD. These analyses highlight the importance of cell-intrinsic pathways in determining the differential vulnerability of DA neurons to degeneration in PD.

December 14, 2021 -

February 15, 2022 -

 

March 8, 2022 -

April 5, 2022 -

 

May 10, 2022 -

PQG Working Group Archive