The goal of the PQG Seminar Series is to encourage the exchanging of ideas and promote interaction, collaboration, and research in quantitative genomics. It seeks to further the development and application of quantitative methods, especially for high dimensional data, as well as focus on the training of quantitative genomic scientists.
2022/2023 Seminar Organizers: Hailiang Huang and Heng Li
Please direct any logistical questions to Amanda King
Upcoming Seminar
PQG seminar meetings for the semester will take a hybrid format – some will be held in person and all will be accessible by Zoom. The link to each meeting will be posted along with the talk information.
Tuesday, May 16, 2023
1:00-2:00 PM
Biostats Rm 426
Join Zoom meeting:
https://harvard.zoom.us/j/93986804758?pwd=RjFlMmtyNWZ6aEN2RWk5WVJaNDRndz09
Ron Do
Associate Professor of Genetics and Genomic Sciences
Icahn School of Medicine at Mount Sinai
Advances in Mendelian Randomization: Robust Causal Inference and Identification of Risk Factors for Coronary Artery Disease
Mendelian randomization (MR) is a commonly used approach in human genetics to infer causal risk factors for complex diseases. In this talk, I present research focused on the development and application of MR methods for robust causal inference testing, interpretation, and the identification of causal risk factors for coronary artery disease. First, I demonstrate the detection of widespread horizontal pleiotropy in MR testing between complex traits and diseases. Next, I describe a method that empirically quantifies horizontal pleiotropy in human genetic variation, showing that it is pervasive and primarily driven by extreme polygenicity of complex traits and diseases. Third, I introduce the concept of causal variance to quantify the contribution of a risk factor to disease. Additionally, I present work related to the application of MR in dissecting causal influences for coronary artery disease. This includes the identification of plasma triglycerides as a causal risk factor for coronary artery disease and a phenome-wide MR analysis of plasma triglycerides on 2,600 disease traits. Finally, I highlight a study on using MR as a tool to identify modifiable lifestyle factors that are causal towards coronary artery disease risk.
2022-2023 Dates
September 20, 2022 - Audrey Hendricks, University of Colorado Denver
Audrey Hendricks
Associate Professor of Statistics
University of Colorado Denver
Methods and frameworks to increase the utility and equity of genetic summary data
Publicly available genetic summary data have high utility providing foundational insights and improving translational medicine. Not only is genetic summary data widely available; compared to individual level data, summary data often have fewer barriers to access promoting open science and the broad use of valuable resources. However, the robust use of this data can be difficult as summarizing masks within and between sample heterogeneity. Differences in data generation or population structure between samples can lead to biased and incorrect results such as false negative and false positive associations and incorrect prioritization of causal variants. Without appropriate methods to estimate and correct for differences, researchers may be left with an inequitable decision to either use a larger but poorly-matched resource or not use the resource at all. Unfortunately, these problems can be magnified for researchers with fewer resources, for understudied conditions, and in populations of mixed ancestry; the very places where additional data is often most needed. Here, I present methods and frameworks to address the issues of study design, ancestry estimation, and association analysis. In addition to discussing existing methods such as Summix, an efficient mixture model for estimating and adjusting for ancestry from summary statistics, and ProxECAT, a rare variant association method for leveraging external common controls, I describe ongoing efforts for flexible extensions and current best practices for the use of publicly available genetic data. Ultimately, this work improves the robust use of genetic summary data for all researchers across a more representative set of conditions, and genetic ancestries.
October 25, 2022 - cancelled
November 15, 2022 - Glennis Logsdon, University of Washington
Glennis Logsdon
Postdoctoral Research Fellow
University of Washington School of Medicine
The sequence, structure, and evolution of human centromeres
For the past twenty years, the sequence of the human genome has remained unfinished due to the presence of large swaths of repeats clustered with centromeres, segmental duplications, acrocentric p-arms, and telomeres. However, recent advances in long-read sequencing technologies and associated algorithms have now made it possible to systematically assemble these regions for the first time. In this talk, I will present the complete sequence of each centromere in the human genome, and I will provide an in-depth look at their structure and evolution over time. I will also reveal how centromeres vary among the human population and how this variation shapes the evolutionary trajectory of each human centromere.
December 6, 2022 - Longzhi Tan, Stanford University
Longzhi Tan
Assistant Professor of Neurobiology
Stanford University
Probing the single-cell 3D genome architectural basis of neurodevelopment and aging in vivo
How do cells in our nervous system develop highly specialized functions despite having (approximately) the same genome? An emerging mechanism is 3D genome architecture: the folding of our 2-meter-long genome into each 10-micron cell nucleus. This architecture brings together genes and distant regulatory elements to orchestrate gene transcription, and has been implicated in neurodevelopmental and degenerative diseases. However, genome architecture is extremely difficult to measure. I developed a DNA sequencing–based method, termed Dip-C, which solved the first 3D structure of the human genome in a single cell. Applying Dip-C to the developing mouse eye, I revealed genome-wide radial inversion of euchromatin and heterochromatin, forming a microlens to concentrate light at night. In the mouse nose, I discovered multiple inter-chromosomal hubs that contain hundreds of olfactory receptor genes and their enhancers, providing a structural basis for their “1 neuron–1 receptor” expression. In the brain, I determined the dynamics of 3 facets of our genome—linear sequence, gene transcription, and 3D structure—during postnatal cortical development. I obtained the true spectrum of somatic mutations in the normal human brain, and discovered a major transformation of both transcriptome and 3D genome in the first month of life in mice. More recently, my lab focused on the cerebellum, which exhibits a unique mode of development, maldevelopment, aging, and evolution. We discovered life-long changes in cerebellar 3D genome architecture in both human and mouse. Our work provides the first look into the “black box” of 3D genome regulation in the cerebellum, and offers tools that are widely applicable to biomedicine.
March 7, 2023 - Molly Schumer, Stanford University
Assistant Professor in Biology
Stanford University
The evolution of reproductive isolation: insights from swordtail fish
Hybridization, or the exchange of genes between different species, is much more common than previously recognized. In the past decade, the genome sequencing revolution has allowed us to peer into the evolutionary histories of myriad species. This has led to the realization that many if not most plant and animal species have hybridized with their close relatives. Even the genome of our own species has been shaped by past hybridization. My research program seeks to illuminate the genetic and evolutionary consequences of hybridization. We study the mechanisms through which negative genetic interactions are eliminated after hybridization and the situations under which hybridization is beneficial, using swordtail fish as a model system.
March 21, 2023 - Jinghui Zhang, Stanford University - postponed
Chair, Department of Computational Biology
St Jude Children’s Research Hospital
Therapy-related clonal evolution in pediatric cancer patients and long-term survivors
Understanding the short-term and long-term therapy-related effect on the genomes of pediatric cancer and survivors is essential for reducing the mortality associated with cancer relapse and the accelerated physiological aging of long-term survivors. We present the discovery of therapy-related mutagenesis processes, including those involved in structural variations (SVs) in relapsed pediatric acute lymphoblastic leukemias and metastatic osteosarcoma, which give rise to resistant clones under the selective pressure of exposure to cytotoxic agents. We also present the dynamics of age- versus therapy-related clonal hematopoiesis (CH) in long-term survivors of pediatric cancer with a median follow-up time of 23.5 years. CH in survivors is associated with exposures to alkylating agents, radiation, and bleomycin. Therapy-related CH shows significant enrichment in STAT3, characterized as a CH-gene specific to Hodgkin lymphoma survivors, and TP53. Single-cell profiling of peripheral blood samples revealed STAT3 mutations predominantly present in T-cells and contributed by SBS25, a mutational signature associated with procarbazine exposure. Serial-sample tracking reveals that larger clone size is a predictor for future expansion of age-related CH clones, while therapy-related CH remains stable decades post-treatment. These data depict the distinct dynamics of these CH subtypes and support the need for longitudinal monitoring to determine the potential contribution to late effects.
April 11, 2023 - Marinka Zitnik, Harvard Medical School
Assistant Professor of Biomedical Informatics
Harvard Medical School
Geometric deep learning for genomic medicine and therapeutic design
Artificial intelligence is poised to enable breakthroughs in science and reshape medicine. We investigate machine learning with a focus on learning systems informed by geometry, structure, and symmetry and grounded in knowledge to achieve deeper reasoning. In this talk, I will discuss recent progress in geometric deep learning to enable precision medicine and advance therapeutic science.
First, I will introduce Shepherd, a geometric deep learning model for diagnosing rare genetic diseases. The challenge with rare diseases is fundamental: datasets with patient diagnoses are three orders of magnitude smaller than in other uses of AI for medical diagnosis. Shepherd is a graph neural network that connects a patient’s clinical-genetic information to the region in the biomedical knowledge graph relevant to diagnosis. It can identify causal disease genes, find rare disease patients with similar genetic and phenotypic features, and assist with the interpretable characterization of novel diseases. The approach creates new opportunities to shorten the diagnostic odyssey for rare diseases.
Second, I will discuss using geometric deep learning for therapeutic science, including zero-shot therapeutic use prediction, modeling of molecular phenotypes, and analysis of high-throughput chemical perturbations. I will highlight Therapeutics Data Commons (
https://tdcommons.ai), an initiative to access and evaluate AI capability across therapeutic modalities and stages of drug discovery. The Commons supports the development of machine learning methods, with a strong bent towards developing the foundations for which methods are most suitable for drug discovery and why.
May 2, 2023 - Joel Bader, Johns Hopkins University
Joel Bader
Professor, Department of Biomedical Engineering
Johns Hopkins University
Connecting GWAS SNPs to causal genes by integrating information across loci
Genome-wide association studies (GWAS) have been powerful in identifying genetic variants, usually single-nucleotide polymorphisms (SNPs), associated with biomedical traits. These SNPs (or other correlated SNPs in the region) usually affect the regulation or function of a nearby causal gene. Better identification of causal genes provides better understanding of disease mechanism and can improve diagnostics and therapeutics. We will describe methods that we have developed to improve the identification of causal genes by integrating information across loci. Some loci have strong evidence pointing to a particular gene as causal, for example existence of a Mendelian gene at the same locus, or co-localization of GWAS signals with eQTL signals. Other loci are information-poor. We use biological networks defined by protein-protein and gene-regulatory interactions to share information across loci, using information-rich loci as leverage to identify the most likely causal genes at information-poor loci. Our method is efficient and robust, using Bayesian models to avoid adjustable parameters. We will discuss results for electrocardiography (ECG) phenotypes, which are risk factors for cardiovascular disease and sudden cardiac death.
May 16, 2022 - Ron Do, Icahn School of Medicine at Mount Sinai
Ron Do
Associate Professor of Genetics and Genomic Sciences
Icahn School of Medicine at Mount Sinai
Advances in Mendelian Randomization: Robust Causal Inference and Identification of Risk Factors for Coronary Artery Disease
Mendelian randomization (MR) is a commonly used approach in human genetics to infer causal risk factors for complex diseases. In this talk, I present research focused on the development and application of MR methods for robust causal inference testing, interpretation, and the identification of causal risk factors for coronary artery disease. First, I demonstrate the detection of widespread horizontal pleiotropy in MR testing between complex traits and diseases. Next, I describe a method that empirically quantifies horizontal pleiotropy in human genetic variation, showing that it is pervasive and primarily driven by extreme polygenicity of complex traits and diseases. Third, I introduce the concept of causal variance to quantify the contribution of a risk factor to disease. Additionally, I present work related to the application of MR in dissecting causal influences for coronary artery disease. This includes the identification of plasma triglycerides as a causal risk factor for coronary artery disease and a phenome-wide MR analysis of plasma triglycerides on 2,600 disease traits. Finally, I highlight a study on using MR as a tool to identify modifiable lifestyle factors that are causal towards coronary artery disease risk.