Current Projects

Eric Van Buren, Hufeng Zhou, Yi Zhang

Integration of Single-Cell-Sequencing Data into Rare Variant Association Tests

Existing methods for performing rare variant association testing in candidate cis-regulatory element regions do not integrate single-cell-sequencing data, and therefore cannot capture the variability in regulatory element activity between cell types. cellSTAAR, a new method in development, uses functional annotations and variant sets constructed using single-cell-sequencing data to conduct functionally-informed rare variant association testing by cell type.
Presentations at CHARGE 2023 & ASHG 2023

Eric Van Buren, Hufeng Zhou

Computational Prioritization of IGVF Experimental Variants using Functional Annotation & Variant Characterization Data

The IGVF consortium aims to massively enhance our understanding of how genetic variation influences disease through the use of experimental techniques and computational modelling. To help experimental center prioritize variants for future study, and to benchmark the value of various annotation data and computational methods, we are working collaboratively with multiple IGVF centers to build a predictive model designed to prioritize variants which are likely to be functional for experimental study.
Presentations at IGVF Annual Meeting in 2023

Rebecca Danning

LACE-UP: An Ensembling Method for Clustering Binary Symptom Data

Neuropsychiatric and behavioral conditions often lack biomarkers and are diagnosed based on the presence or absence of a variety of traits, leading to highly heterogeneous phenotypes that may comprise latent subtypes. Detecting these subtypes can be challenging due to a lack of existing methods for clustering binary data that are robust to various realistic data settings, including not knowing the number of subtypes, the inclusion of symptoms that are unrelated to the true underlying disease, and correlation of symptoms within individuals. LACE-UP is a novel and robust method for clustering binary data that does not require prespecifying the number of clusters and outperforms gold standard techniques in the presence of extraneous variables and local dependence.
Poster presentation at ASHG 2023

Hui Li

HEELS: Accurate and Efficient Estimation of Local Heritability using Summary Statistics and LD Matrix
Existing SNP-heritability estimators that leverage summary statistics from genome-wide association studies (GWAS) are much less efficient (i.e., have larger standard errors) than the restricted maximum likelihood (REML) estimators which require access to individual-level data. We introduce a new method for local heritability estimation – Heritability Estimation with high Efficiency using LD and association Summary Statistics (HEELS) – that significantly improves the statistical efficiency of summary-statistics-based heritability estimator and attains comparable statistical efficiency as REML (with a relative statistical efficiency greater than $92%). Moreover, we propose representing the empirical LD matrix as the sum of a low-rank matrix and a banded matrix. We show that this way of modeling the LD can only reduce the storage and memory cost, but also improve the computational efficiency of heritability estimation. We demonstrate the statistical efficiency of HEELS and the advantages of our proposed LD approximation strategies both in simulations and through empirical analyses of the UK Biobank data.
Github repository: https://github.com/huilisabrina/HEELS
Presentations at ENAR 2023, ASHG 2023 & NESS 2022
Published in Nature Communications

graphREML: Heritability partitioning and enrichment analyses with higher precision
Heritability enrichment analysis has been one of the most valuable approaches to understand genetic architecture and to link functional genomic datasets with disease genetics. Stratified LD score regression (S-LDSC) is the most widely used method for heritability partitioning and enrichment analyses, but S-LDSC has low statistical power; moreover, S-LDSC assumes an unrealistic linear relationship between the heritability of a SNP and its annotations. Recently, Salehi et al. proposed “LD graphical models (LDGMs)”, which represent LD patterns using extremely sparse matrices derived from genome-wide genealogies (Kelleher et al. 2019). LDGMs enable the use of efficient sparse matrix operations, potentially addressing the challenge of likelihood-based heritability partitioning. We introduce graphREML, a new heritability partitioning method that operates on GWAS summary statistics and sparse representations of the LD via the LDGM precision matrices, allowing for overlapping and continuous annotations. In both simulation studies and analyses of real traits, we found that graphREML improves upon S-LDSC by modeling the full likelihood of the summary statistics, and is robust to out-of-sample application of the LD.
Github repository: https://github.com/huilisabrina/graphREML
Presentations at ASHG 2023, JSM 2023 & Probabilistic Modeling in Genomics 2023