SESSION I: Emerging Technologies
- Brian Beliveau
Wyss Institute / Harvard Medical School
OligoMiner: A rapid, flexible environment for the design of genome-scale Oligopaint oligonucleotide FISH probes for conventional and super-resolution imaging
Brian J. Beliveau, Jocelyn Y. Kishi, Guy Nir, Hiroshi M. Sasaki, Sinem K. Saka, Son C. Nguyen, Chao-ting Wu, Peng Yin
Oligonucleotide (oligo)-based fluorescence in situ hybridization (FISH) has emerged as an important tool for the study of chromosome organization and gene expression and has been empowered by the availability of highly complex oligo pools. However, a dedicated bioinformatic design utility has yet to be created specifically for the purpose of identifying optimal oligo FISH probe sequences on the genome-wide scale. Here, we introduce OligoMiner, a rapid and robust computational pipeline for the genome-scale design of oligo FISH probes that affords the scientist exact control over the parameters of each probe. OligoMiner uses standard bioinformatic file formats, allowing users to seamlessly integrate other utilities into the pipeline as desired, and introduces a novel method for evaluating the specificity of each probe that connects simulated hybridization energetics to rapidly generated sequence alignments using supervised learning. We demonstrate the scalability of our approach by performing genome-scale probe discovery in several model organism genomes and showcase the performance of the resulting probes with both conventional and single-molecule super-resolution imaging of chromosomal and RNA targets. We anticipate this pipeline will make the FISH probe design process much more accessible and will simplify the design of pools of hybridization probes for a variety of applications.
- Robin Kirkpatrick – Stellar Abstract Award*
Wyss Institute / Harvard Medical School
Rewiring the 3D structure of the genome
Robin Kirkpatrick, Jingwen Sun, Jesse Zalatan
The physical organization of the genome plays a central role in biological processes ranging from cell division to gene regulation. Understanding the functional significance of genome spatial organization is currently hampered, however, by the lack of tools to systematically perturb genome structure in space and time. To address this challenge, we have developed a new method to physically reposition genes within the nucleus of eukaryotic cells. Using CRISPR-Cas DNA binding complexes, we can tether genomic sites to targeting protein domains that localize to specific subnuclear sites. We demonstrate this method by recruiting genes to the nuclear periphery and the nuclear pore complex in yeast and human cells. A key advantage of this approach is that it directly targets endogenous sites, unlike prior methods that require heterologous binding sites in the genomic site of interest. Further, the system can be coupled to chemical dimerization domains for inducible control. By perturbing genome structure and assessing the functional consequences at many different sites, this new tool will enable us to systematically probe how the physical structure of the genome contributes to gene regulation.
- Jocelyn Kishi
Wyss Institute / Harvard Medical School
Primer Exchange Reactions (PER) for Programmable Isothermal Synthesis of Single-Stranded DNA
Jocelyn Kishi, Thomas Schaus, Nikhil Gopalkrishnan, Feng Xuan, Peng Yin.
DNA patterns life by encoding the information for diverse molecular functions in the genome. It also serves as the template substrate for a multitude of synthetic reaction networks. Here, we introduce the concept of primer exchange reactions (PER), which use catalytic DNA hairpin species to pattern the isothermal synthesis of single-stranded DNA (ssDNA) in a stepwise fashion. The technique represents a novel method to synthesize arbitrary ssDNA in an isothermal in situ environment and could provide the basis for a new generation of molecular devices.
- Ninning Liu
Wyss Institute / Harvard Medical School
Observe and Perturb: Super-resolved Manipulation at the Nanoscale
Ninning Liu, Mingjie Dai, Peng Yin
While many recent advances in super-resolution microscopy (e.g. STORM, PALM, PAINT) have allowed observation of molecular architecture down to the scale of individual components (~5 nm), there has not been a parallel development of microscopy manipulation methods at a comparable scale. Microneedles, and laser-based crosslinking are some examples of tools used to manipulate visualized samples, but ultimately cannot go beyond the diffraction limit of light.
Here we present a nanoscale, targeted manipulation method based on the PAINT (point accumulation for imaging in nanoscale topography) principle. Briefly, PAINT utilizes a fluorescent probe that transiently and repetitively binds to imaging targets. In contrast with localization-based methods such as STORM or PALM, the fluorescent probe in PAINT is freely diffusing, with a reasonable expectation of only one probe bound to an imaging target within a diffraction-limited area at any given time. This makes a light-induced targeting technique possible, whereby a laser pulse will be introduced at the precise moment when a probe binds to a pre-determined region of interest.
We combined our previously developed DNA-PAINT super-resolution imaging technique with a fast, photo-activated crosslinking chemistry. Currently, we have demonstrated multi-target, single-molecule labeling of points on a DNA nanostructure testboard with 30 nm spatial discrimination.
- Yaping Liu – Platform Speaker*
Massachusetts Institute of Technology
MethylHiC reveals long-range genetic-epigenetic and epigenetic-epigenetic interactions within the same single molecule
Yaping Liu, Guoqiang Li, Bing Ren, Manolis Kellis
DNA methylation is the most extensively studied epigenetic marker that plays direct role in mammalian gene regulation, such as gene silencing and imprinting. Epigenetic gene silencing and activation has also long been envisaged as a local event. Very recent investigations indicate that large regions of chromosomes can be co-coordinately suppressed or activated. Here, we developed MethylHiC, which applies in situ Hi-C followed by bisulfite treatment to understand the long-range genetic-epigenetic and epigenetic-epigenetic interactions within the same single DNA molecule.
We validated that MethylHiC is globally consistent with both of in situ Hi-C and WGBS data generated from the same cell line. We found that DNA methylation is highly concordant at distal interacted regulatory regions within the same DNA molecule. The concordance level shows different patterns between regions at different chromatin states and imprinting regions. We also detected long-range allelic specific methylation. Further, we showed that long-range genetic-epigenetic and epigenetic-epigenetic interactions can be disrupted by Ten-eleven Translocation (TET) knockout. Moreover, by incorporating DNA methylation concordance information into the analysis, we can improve the Hi-C resolution. Our method here paves the road to evaluate the direct long-range effect of genetic and epigenetic alterations at different pathological conditions within the same DNA molecule.
- Nuno Martins
Harvard Medical School
Visualizing the genomic nano-architecture of vertebrate centromeres using super-resolution microscopy
Nuno Martins, Tatsuo Fukagawa, Chao-Ting Wu
Centromeres play a crucial role in the accurate regulation of chromosome segregation and, thus, in preventing aneuploidy, which can drive cancer and infertility. They are epigenetically maintained loci whose function depends on the regulation of local chromatin and is closely tied to cell cycle cues. Here, we explore whether centromere function is correlated with specific aspects of chromosome folding and nuclear positioning, two features of genome organization that have already been shown to play key roles in genome function. This issue has remained poorly understood primarily due to technological limitations, which are exacerbated by the nature of the repetitive sequences present at many centromeres.
Our approach uses OligoSTORM, which is a novel application of super-resolution single-molecule-localization microscopy that is based on Oligopaints FISH technology and enables the visualization of repetitive or non-repetitive genomic loci at ≤20 nm resolution. We are currently characterizing the nano-architecture of two non-repetitive centromeres of chicken as well as their adjacent loci. We are developing quantitative approaches for analyzing super-resolution data from genomic loci (physical parameters, cell-to-cell variability, cell-cycle-dependent changes), and developing quality-control pipelines. We will present an update on our progress.
- Guy Nir
Harvard Medical School
Towards tracing chromosomes with super-resolution microscopy
Guy Nir, Brian J. Beliveau, Hiroshi Sasaki, Bogdan Bintu, Alistair Boettiger, Sonny C. Nguyen, Geoffrey Fudenberg, Ruth McCole, Huy Nguyen, Jelena Erceg, Nuno C. Martins, Mohammed Hannan, Carl Ebling, Jeff Stuckey, John Schreiner, Steven Callahan, Erez Lieberman Aiden, Marc Marti-Renom, Leonid Mirny, Xiaowei Zhuang, Peng Yin, Ting Wu
Are compartments spatially distinct? Are contact domains the building blocks of the genome? Over the last decade, biochemical and computational strategies have uncovered much of the mystery of chromosome folding, first at the ensemble level, and now at the level of the single cell. Microscopy, which excels at the single-cell level and provides a direct means for visualizing single chromosomes, has not yet been widely applied to provide accurate and complete maps of chromosome structure. Here, using OligoSTORM, we have addressed two main challenges for microscopy, labeling and resolution, that have limited our ability to visualize genomic structure. OligoSTORM is a melding of 1) Oligopaints, which are computationally designed short ssDNA oligos that hybridize in situ at specific genomic positions and carry a fluorophore for detection, and 2) stochastic optical reconstruction microscopy (STORM), which provides a means for resolving images at 20 nm resolution. By combining OligoSTORM with a fluidics apparatus, we have enabled sixteen sequential rounds of imaging, visualizing Hi-C defined compartments, contact domains, and loops, all in the same nucleus and in an automated way. Our findings will inform our understanding of the relationship between Hi-C representations of the genome and chromosome organization.
- Sinem Saka
Wyss Institute / Harvard Medical School
Sinem K. Saka, Brian J. Beliveau, Hiroshi Sasaki, Mingjie Dai, Peng Yin
Multiplexed Super-resolution Imaging of Nuclear Organization using DNA-PAINT
Localization microscopy based super resolution imaging makes use of stochastically occurring single-molecule fluorescence events (“blinkings”) to find the position of each molecule with high precision. These collections of localizations are then used to reconstruct a super-resolution image, molecule by molecule. One such technique is DNA-based Point Accumulation for Imaging in Nanoscale Topography (DNA-PAINT), where the “blinking” is achieved by the transient hybridization of short DNA oligonucleotides called imager strands. To perform DNA-PAINT, the probes such as antibodies or FISH probes that target the structures of interest are modified to include a short DNA sequence (docking strand). By adding imager strands complementary to the docking strands onto cells, cellular targets can be imaged at ~10 nm resolution. In addition to high-spatial resolution, use of DNA as an imaging probe offers unique advantages owing to the full programmability of DNA-DNA interactions. One important advantage is the multiplexing capability, which makes it possible to image for >10 targets sequentially in the same sample (Exchange-PAINT). We are currently applying our multiplexed super-resolution imaging capability to image DNA, RNA and protein targets in the same cellular samples to understand the nanoscale organization of chromatin and the surrounding nuclear elements in individual cells.
- Hiroshi Sasaki – Stellar Abstract Award*
Painting chromosomes – highly multiplexed 3D super-resolution imaging of chromosomes in situ with DNA-PAINT
Hiroshi M. Sasaki, Brian J. Beliveau, Mingjie Dai, Florian Schüder, and Peng Yin
Decades of research has revealed that the organization of DNA within the nucleus is non-random and plays an important role in many nuclear processes. In particular, the chromosome conformation capture family of methods has revolutionized our view of nuclear organization by revealing basic organizational features such as compartments and domains. However, as these assays are performed on populations of cells, they provide only limited information about individual cells and hence many gaps remain in our understanding of chromosome organization. Microscopy-based techniques such as fluorescence in situ hybridization (FISH) have the potential to address this limitation, as they permit the visualization of specific genomic loci in single cells. Here we apply our recently developed single-molecule super-resolution imaging method called DNA-PAINT (DNA-based Point Accumulation for Imaging in Nanoscale Topography) and Oligopaints –bioinformatically designed, programmable oligonucleotide-based FISH probes– to develop a simple and robust single-cell technique that combines high spatial super-resolution and high multiplexing for interrogating chromosome organization in individual cells. We have achieved 6-color images at <30 nm resolution of mega base-scale ‘domain’ and ‘gap’ regions of X chromosome in human diploid cells on using our spinning disc confocal light path. These preliminary results show a high degree of connectivity between adjacent regions, even at fine scales. This 6-color demonstration of <30 nm resolution chromosome tracing represents a significant advance beyond the demonstrated 1–2 color SMLM FISH imaging and an intermediate step towards an ultimate goal of understanding the structure and position of ensemble domains described by chromosomal conformation capture experiments.
- Longzhi Tan
Single-cell chromatin conformation capture by Meta-C enables 3D reconstruction of the diploid human genome
Longzhi Tan, Dong Xing, Chi-Han Chang, Xiaoliang Sunney Xie
Despite recent advances in single-cell chromatin conformation capture (3C, or Hi-C), the 3D structure of the human genome remains elusive because of its diploid nature. Based on our recent whole-genome amplification technology (Chen et al., Science 2017), we developed a novel single-cell 3C method, Meta-C, which detects more chromatin contacts than existing methods. Meta-C allowed us to reconstruct for the first time the diploid human genome, from a human cell line (GM12878) and from primary cells (PBMCs) at a 20-kb resolution. Our results confirmed existing principles of chromatin folding and uncovered new ones.
SESSION II: Applications to Basic Biology and Disease Mechanisms
- Kadir Akdemir
MD Anderson Cancer Center
Not so random – Impact of genome organization on mutational signatures in cancer
Kadir Akdemir and Andrew Futreal
The hierarchical folding of genomic DNA within the nucleus is intimately linked with transcriptional regulation. However, the interplay between chromatin folding and mutational processes is unclear. Here, we sought to understand the effects of three- dimensional genome organization on the distribution of somatic mutations in human cancers, which aimed to elucidate the mechanisms behind different DNA repair and damage processes.
We utilized data from 2850 cancer genomes and analyzed the distribution of more than 60 million somatic mutations associated with 40 different mutational signatures in human cancers. Our analysis revealed a strong correlation between the mutation distribution in human cancers and the spatial organization of the genome. Notably, transcriptionally-active domains contain less mutation burden compared to -inactive domains. As a result, regional mutation rates are drastically different around the boundaries delineating transcriptionally distinct domains. Interestingly, cancer- associated DNA repair deficiencies can cause changes in the mutation distribution, e.g. samples with DNA mismatch repair deficiency (MSI) exhibit flatter mutation distribution around domain boundaries. Moreover, the unique folding structure of the inactive X- chromosome leads to distinct somatic mutation distributions in females compared to males in various cancer types. Taken together, our analyses reveal fundamental insights about genome architecture and mutational processes in human cancers.
- Jumana AlHaj Abed
Harvard Medical School
Functional characterization of trans homolog pairing in a hybrid cell lineJumana AlHaj Abed, Jelena Erceg, Anton Goloborodko, Son C. Nguyen, Ruth B. McCole, Wren Saylor, Nezar Abdennur, Geoffrey Fudenberg, Bryan R. Lajoie, Job Dekker,Leonid A. Mirny, Ting (C.-ting) Wu
Gene regulation is dictated by the linear DNA sequence as well as the 3D organization of the genome, the latter involving both cis and trans chromosomal interactions. Although these interactions facilitate a range of biological processes including gene regulation, trans interactions, in particular, are understudied in mammals due to their low abundance.
To better understand trans interactions, we are working with Drosophila, which pairs paternal and maternal homologous chromosomes along their entire lengths in somatic cells throughout development. This unique trans phenomenon, referred to as homolog pairing, has been central to decades’ worth of studies describing locus-specific pairing-mediated trans regulation.
Here, we present our findings from a genome-wide trans-specific Hi-C study utilizing a male, diploid, hybrid cell line generated from F1 embryos derived from a cross between two divergent Drosophila strains. Preliminary studies show homolog pairing correlates with active chromatin. We are now pursuing experiments that will examine whether knocking down a known pairing factor may be reflected in trans contact frequencies across the genome. Additional work will correlate these trans contact data with RNA expression and provide a complementary approach to understanding the structure of paired homologs and how such pairing relates to gene expression.
- Rasim Barutcu
TAD boundary organization at the Firre locus occurs independently of CTCF binding and lncRNA transcription
Rasim Barutcu, Philipp G. Maass, Jordan P. Lewandowski, Catherine L. Weiner, John L. Rinn
The genome is folded into topologically associating domains (TADs), but the mechanisms mediating TAD formation are poorly understood. Long non-coding RNAs (lncRNAs) have been proposed to be involved in TAD formation, but whether they contribute by their transcription, or their RNA product, remains unknown. Here, we investigated CTCF, transcription and the presence of a lncRNA to form TAD boundaries, using the Firre lncRNA locus as a model. We demonstrate that neither Firre’s deletion, nor the ectopic insertion of its cDNA or its induced expression were sufficient to alter TAD organization in a sex or allele-specific manner. The Firre deletion did, however, disrupt the formation of the inactive X-chromosome “super-loop”. Collectively, our study highlights that CTCF binding, local lncRNA presence and transcription are neither sufficient nor necessary for establishing TADs, thereby suggesting additional mechanisms for TAD formation in the Firre locus.
- Giancarlo Bonora
University of Washington
Orientation-dependent Dxz4 contacts shape the 3D structure of the inactive X chromosome
G. Bonora, X. Deng, H. Fang, V. Ramani, R. Qiu, J. Berletch, G. N. Filippova, Z. Duan, J. Shendure, W.S. Noble, C.M. Disteche
The mammalian inactive X chromosome (Xi) condenses into a bipartite structure with two superdomains of frequent long-range contacts separated by a boundary or hinge region. Using in situ DNase Hi-C in mouse cells with deletions or inversions within the hinge we show that the conserved repeat locus Dxz4 alone is sufficient to maintain the bipartite structure and that Dxz4 orientation controls the distribution of long-range contacts on the Xi. Frequent long-range contacts between Dxz4 and the telomeric superdomain are either lost after its deletion or shifted to the centromeric superdomain after its inversion. This massive reversal in contact distribution is consistent with the reversal of CTCF motif orientation at Dxz4. De-condensation of the Xi after Dxz4 deletion is associated with partial restoration of TADs normally attenuated on the Xi, and with an increase in chromatin accessibility and CTCF binding, but few changes in gene expression, in accordance with multiple epigenetic mechanisms ensuring X silencing. We propose that Dxz4 represents a structural platform for frequent long-range contacts with multiple loci in a direction dictated by the orientation of a bank of CTCF motifs at Dxz4, which may work as a ratchet to form the distinctive bipartite structure of the condensed Xi.
- Hugo Brandao – Stellar Abstract Award*
A Mechanism of Cohesin-Dependent Loop Extrusion Organizes Zygotic Genome Architecture
Hugo B. Brandao, Johanna Gassler, Maxim Imakaev, Ilya M. Flyamer, Sabrina Ladstatter, Wendy A. Bickmore, Jan-Michael Peters, Leonid A. Mirny, Kikue Tachibana-Konwalski
Fertilization triggers assembly of higher-order chromatin structure from a naive genome to generate a totipotent embryo. Chromatin loops and domains are detected in mouse zygotes by single-nucleus Hi-C (snHi-C) but not bulk Hi-C. We resolve this discrepancy by investigating whether a mechanism of cohesin-dependent loop extrusion generates zygotic chromatin conformations. Using snHi-C of mouse knockout embryos, we demonstrate that the zygotic genome folds into loops and domains that depend on Scc1-cohesin and are regulated in size by Wapl. Remarkably, we discovered distinct effects on maternal and paternal chromatin loop sizes, likely reflecting loop extrusion dynamics and epigenetic reprogramming. Polymer simulations based on snHi-C are consistent with a model where cohesin locally compacts chromatin and thus restricts inter-chromosomal interactions by active loop extrusion, whose processivity is controlled by Wapl. Our simulations and experimental data provide evidence that cohesin-dependent loop extrusion organizes mammalian genomes over multiple scales from the one-cell embryo onwards.
- Luca Caputo
Sanford Burnham Prebys Medical Discovery Institute
Transcription Factor-Directed Re-wiring of Chromatin Architecture During Somatic Cell Trans-differentiation
Alessandra Dall’Agnese, Luca Caputo, Chiara Nicoletti, Sole Gatto, Anthony Schmitt, Yarui Diao , Zhen Ye, Mattia Forcato, Ranjan Perera, Silvio Bicciato, Bing Ren, Pier Lorenzo Puri
How tissue-specific transcription factors (TFs) direct high-order chromatin interactions that drive nuclear reprogramming during somatic cell trans-differentiation is an unsolved issue in regenerative medicine. We exploited the ability of a single TF, MYOD, to convert human fibroblasts into skeletal muscle cells. Integrative analysis of high-resolution Hi-C with MYOD ChIP-seq and RNA-seq revealed that MYOD-directed myogenic conversion of IMR90 fibroblasts entails an extensive alteration of the tri-dimensional chromatin structure to enable coordinated repression of pre-existing gene networks and activation muscle-specific genes. We show that MYOD binding at insulated neighbourhoods (INs), often near to CTCF-bound DNA elements, re-wires chromatin interactions between promoters, enhancers and FIREs to orchestrate activation and repression of gene expression. Directing dCAS9 against E-Boxes at regulatory elements in ITGA7 or TNNT2 loci, prior to MYOD expression, caused a lessen activation of these genes, as well as misregulation of genes within the same domain, and prevent interaction of the regulatory elements. Furthermore, we could show that continuous expression of exogenous MYOD is necessary to maintain the skeletal muscle genomic architecture and gene expression during the first stage of reprogramming. These data shed light on the mechanism by which a TF help to reorganize the 3D genome during somatic cell reprogramming.
- Daniel Day
Mutations at insulated neighborhood boundaries accumulate throughout the progression of esophageal adenocarcinoma
Daniel S. Day, Denes Hnisz, Xiaodun Li, Rebecca C. Fitzgerald, Richard A. Young
Insulated neighborhoods are chromosomal loop structures that provide for specific enhancer-gene interactions and are essential for normal gene regulation. Insulated neighborhoods are formed by the interaction of two DNA-bound CTCF proteins and the cohesin complex, and disruption of an insulated neighborhood CTCF boundary site leads to abnormal expression of genes inside and nearby the boundary site. Recent studies have reported frequent somatic mutations in the CTCF boundary sites of insulated neighborhoods in cancer genomes, but the contribution of these mutations in tumorigenesis is unclear. Using somatic mutations from our cohort of over 300 esophageal adenocarcinoma (EAC) patients and cohesin HiChIP and CTCF ChIP-seq data from EAC tumor cell lines, we identify hundreds of mutations at insulated neighborhood boundaries in EAC that show evidence for driving tumorigenesis. Neighborhood boundary mutations were preferentially accumulated during the progression of EAC tumors. We identified many recurrently mutated insulated neighborhood boundaries, and these boundaries were associated with misregulation of genes that comprise cancer-related pathways. This work suggests that insulated neighborhood boundary mutations can be clinically important events. These insulated neighborhood boundary mutations could be used to identify new target genes for cancer therapeutics, especially for certain cancers where the driver genes are poorly understood.
- Jelena Erceg – Stellar Abstract Award*
Harvard Medical School
A signature of trans inter-homolog pairing in haplotype-resolved genomes
Jelena Erceg, Jumana AlHaj Abed, Anton Goloborodko, Bryan R. Lajoie, Geoffrey Fudenberg, Nezar Abdennur, Maxim Imakaev, Ruth B. McCole, Son C. Nguyen, Eric F. Joyce, Tharanga Niroshini Senaratne, Mohammed A. Hannan, Guy Nir, Job Dekker, Leonid A. Mirny, Ting (C.-ting) Wu
Nuclear organization is established in part through chromosomal interactions in three-dimensional space, including abundant cis contacts as well as trans contacts. Although trans interactions are implicated in gene regulation, development, and cancer translocations, they are nevertheless much understudied in a genome-wide manner due to their low abundance. Drosophila, however, represents an ideal system for studying trans interactions, as homolog pairing occurs in interphase somatic cells from early embryogenesis to adulthood. Here, we crossed two divergent strains of Drosophila and assessed the time when maternal and paternal genomes come together for the first time during embryogenesis. Upon developing a stringent method, we distinguished cis and trans inter-homolog interactions. Our Hi-C maps showed highly concordant cis contacts between homologs and striking Hi-C signals for homolog pairing that are stronger than signals representing the interaction between two arms of the same chromosome (i.e. Rabl configuration). We will discuss how our Hi-C maps for embryonic pairing compare to maps representing more advanced pairing in newly generated cell lines derived from hybrid embryos and, in addition, how these maps relate to models of homolog pairing. Taken together, this study provides a tool for investigating trans contacts, especially homolog pairing, in haplotype-resolved genomes.
- Ittai Eres – Platform Speaker*
University of Chicago
Interrogating the 3D Structure of the Genome in Human and Chimpanzee Tissues
Ittai Eres, Kevin Luo, Yoav Gilad
Over the last several decades, a growing body of evidence has suggested that variation in gene expression plays a crucial role in both speciation and tissue differentiation. In this work, we probe regulatory divergence between humans and chimps, and across a variety of tissues, by performing Hi-C on induced pluripotent stem cells (iPSCs) from both species. Critically, these same individuals’ iPSCs are then differentiated into cells from all three definitive germ layers, allowing for a deep temporal and developmental understanding of the dynamics of 3-dimensional chromatin folding. By integrating a wide variety of orthogonal data from the same individuals (RNA-seq, methylation, ATAC-seq, etc.), we build a comprehensive picture of gene regulation and how it differs between species and between tissues. Initial analysis of Hi-C data shows that contacts are most dissimilar between humans and chimps on chromosomes with large-scale structural rearrangements between the two species (e.g. chromosomes 2, 16, and 17). The ultimate aim is to determine to what extent the variance between tissues and species seen in gene expression is concomitant with variance seen in Hi-C interaction frequency and a wide variety of other regulatory phenotypes.
- John Froberg
Massachusetts General Hospital
The higher order structure of the X-chromosome and its impact on X-inactivation
John E. Froberg, Andrea Kriz, Teddy Jégu, Jeannie T. Lee
Unlike all other mammalian chromosomes, the two female X-chromosomes have very different epigenetic statuses. One X-chromosome (the Xi) is almost completely transcriptionally inactivated by the non-coding RNA Xist and the other remains active (the Xa). We have found that the higher-order structures of the two X-chromosomes are very different from each other. Like the autosomes, the Xa is organized into topologically associated domains (TADs), megabase-sized regions of the chromosome that highly interact with themselves and are insulated from the rest of the chromosome. TADs and cohesin binding are either absent or highly attenuated on the Xi, suggesting that it adopts a more “randomized” conformation than any other mammalian chromosome. Interestingly, some TADs and some cohesin sites are restored on the Xi following Xist deletion, suggesting that Xist is required to disrupt the domain organization of Xi and to remove architectural proteins. In addition to the absence of TADs on the Xi, another unique facet of the structure of the Xi is the fact that it is organized into two very large “megadomains” with the tandem repeat DXZ4 as the border of the megadomains. Our current work is focused on the role of this megadomain structure on silencing during X-chromosome inactivation.
- Yao Fu
Oklahoma Medical Research Foundation
Protein-mediated 3D genome architecture in human B cells by HiChIP
Yao Fu, Richard Pelikan, Caleb A. Lareau, Martin J. Aryee, Jennifer Kelly and Patrick M. Gaffney
HiChIP was shown to have higher efficiency and specificity to identify protein-mediated 3D chromatin looping information compared to ChIA-PET. We performed HiChIP assays to map the DNA looping patterns in EBV-transformed human B cell lines originally obtained from systemic lupus erythematosus (SLE) patients. We measured 3D DNA contacts mediated by an essential nucleus protein, CTCF, and an epigenetic histone mark, histone 3 lysine 27 acetylation (H3K27ac). Using 1 to 10 million input cells, we obtained approximately 100M paired-end tags (PETs), of which 30% represent intrachromosomal long-range interactions spanning between 5KB and 2MB. We further investigated the chromatin interactions approximately 185kb upstream of the tumor necrosis factor alpha inducible protein 3 (TNFAIP3) gene. This non-coding region was previously identified as an SLE and rheumatoid arthritis (RA) susceptibility region. Our H3K27ac HiChIP data showed that this region is enriched with H3K27ac-mediated looping activities. Interestingly, although this region is flanked by the oligodendrocyte transcription factor 3 (OLIG3) and TNFAIP3 gene, we found that the associated SNPs are mainly looping to further upstream genes, interleukin 20 receptor alpha (IL20RA) and interferon gamma receptor 1 (IFNGR1). Our data provide new insight into finding direct functional targets associated with GWAS variants that are in non-coding regions.
- Steven Gazal
Harvard T.H. Chan School of Public Health
The functional low-frequency variant architecture of human diseases and complex traits
Gazal, P.R. Loh, A. Schoech, B. van de Geijn, S. Sunyaev, A. Gusev, B. Neale, H.K. Finucane, A.L. Price
GWAS have highlighted that common variant heritability is concentrated into tissue-specific non-coding annotations. However, little is known about non-coding variant enrichments in low-frequency architectures or connecting functional enrichments to gene-set discovery strategies. To investigate these issues, we partitioned the heritability of common and low-frequency (0.5%___MAF<5%) variants in 27 independent complex traits (average N=355K) from UK Biobank across numerous functional annotations.
We determined that non-synonymous variants explain 18% of low-frequency variant heritability (h2lf), vs. 2% of common variant heritability (h2c). We observed analogous differences for coding and UTRs variants, and regions conserved in primates. At the trait level, we determined that tissue-specific annotations extremely enriched in h2c tend to be similarly enriched in h2lf.
Next, we defined a strategy to create disease-informative gene set annotations, which restricts genes to their coding and regulatory elements (connected through Hi-C data) and uses conserved elements and transcription factor binding sites to make these elements more precise. We illustrate the power of this strategy on ExAC genes depleted for loss-of-function variants, for which gene bodies cover 12% of the genome and explain 24% of h2lf, whereas our corresponding disease-informative gene-set annotation covers only 1.4% of the genome but explains 27% of h2lf.
- Johan Gibcus – Stellar Abstract Award*
University of Massachusetts Medical School
Mitotic chromosomes fold by condensin-dependent helical winding of chromatin loop arrays
Johan H. Gibcus, Kumiko Samejima, Anton Goloborodko, Itaru Samejima, Natalia Naumova, Masato Kanemaki, Linfeng Xie, James R. Paulson, William C. Earnshaw, Leonid A. Mirny, Job Dekker
During mitosis, chromosomes fold into compacted rod shaped structures. We combined imaging and Hi-C of synchronous DT40 cell cultures with polymer simulations to determine how interphase chromosomes are converted into compressed arrays of loops characteristic of mitotic chromosomes. We found that the interphase organization is disassembled within minutes of prophase entry and by late prophase chromosomes are already folded as arrays of consecutive loops. During prometaphase, this array reorganizes to form a helical arrangement of nested loops. Polymer simulations reveal that Hi-C data are inconsistent with solenoidal coiling of the entire chromatid, but instead suggest a centrally located helically twisted axis from which consecutive loops emanate as in a spiral staircase. Chromosomes subsequently shorten through progressive helical winding, with the numbers of loops per turn increasing so that the size of a helical turn grows from around 3 Mb (~40 loops) to ~12 Mb (~150 loops) in fully condensed metaphase chromosomes. Condensin is essential to disassemble the interphase chromatin conformation. Analysis of mutants revealed differing roles for condensin I and II during these processes. Either condensin can mediate formation of loop arrays. However, condensin II was required for helical winding during prometaphase, whereas condensin I modulated the size and arrangement of loops inside the helical turns. These observations identify a mitotic chromosome morphogenesis pathway in which folding of linear loop arrays produces long thin chromosomes during prophase that then shorten by progressive growth of loops and helical winding during prometaphase.
- Richard Gill
Analysis of Heart Disease GWAS SNPs in High-Resolution Hi-C Data from Primary Human Cardiac Cells Reveals Distal Gene Targets with Pathophysiological Relevance
Richard Gill, Baikang Pei, Yi-Hsiang Hsu
Most disease-associated SNPs span non-coding regions, and some reside within regulatory elements. eQTL mapping can help identify regulatory variants’ targets, but cannot distinguish causal variants from proxy SNPs. Evidence of physical interactions between SNPs and target genes from Hi-C data provides more direct evidence of disease-relevant regulatory circuits.
We hypothesize that the cardiac fibroblast represents a heart disease-relevant cellular context, and generated Hi-C data from atrial and ventricular human primary cells. We processed our data using HiC-Pro, and called significant interactions with GOTHiC and HOMER at 1.6-kb resolution. We analyzed gene promoters interacting with 926 heart disease-associated SNPs and their proxies (r2>0.8).
In atrial and ventricular fibroblasts, GOTHiC identified 6.6×106 and 11.2×106 significant bin-bin interactions, respectively (~18% genic), while HOMER identified 1.5×106 peaks in both regions (~34-38% genic), and these interactions spanned ~15,000 genes. We found 380 unique genes interacting with 9,758 lead and proxy SNPs. These genes were enriched in metabolic disease pathways, as well as cardiovascular system and connective tissue development and function, even after excluding genes nearest to the lead SNPs.
Analysis of disease-associated SNPs and high-resolution Hi-C identified pathophysiologically-relevant genes, which will be prioritized by layering additional functional genomic and expression data.
- Farhad Hormozdiari
Leveraging molecular QTL to understand the genetic architecture of diseases and complex traits
Farhad Hormozdiari, Steven Gazal, Bryce van de Geijn, Hilary Finucane, Chelsea J.-T. Ju, Po-Ru Loh, Armin Schoech, Yakir Reshef, Xuanyao Liu, Luke O’Connor, Alexander Gusev, Eleazar Eskin, Alkes L. Price
There is increasing evidence that many GWAS risk loci are molecular QTL for gene expression (eQTL), histone modification (hQTL), splicing (sQTL), and/or DNA methylation (meQTL). Here, we introduce a new set of functional annotations based on causal posterior probabilities (CPP) of fine-mapped molecular cis-QTL, using data from the GTEx and BLUEPRINT consortia. We show that these annotations are very strongly enriched for disease heritability across 41 independent diseases and complex traits (average $N$=320K): 5.84x for GTEx eQTL, and 5.44x for eQTL, 4.27-4.28x for hQTL (H3K27ac and H3K4me1), 3.61x for sQTL and 2.81x for meQTL in BLUEPRINT (all P $\leq$ 1.39e-10), far higher than enrichments obtained using standard functional annotations that include all significant molecular cis-QTL (1.17-1.80x). eQTL annotations that were obtained by meta-analyzing all 44 GTEx tissues generally performed best, but tissue-specific blood eQTL annotations produced stronger enrichments for autoimmune diseases and blood cell traits and tissue-specific brain eQTLannotations produced stronger enrichments for brain-related diseases and traits, despite high cis-genetic correlations of eQTL effect sizes across tissues. Notably, eQTL annotations restricted to loss-of-function intolerant genes from ExAC were even more strongly enriched for disease heritability (17.09x; vs. 5.84x for all genes; P = 4.90e-17 for difference). All molecular QTL except sQTL remained significantly enriched for disease heritability in a joint analysis conditioned on each other and on a broad set of functional annotations from previous studies, implying that each of these annotations is uniquely informative for disease and complex trait architectures.
- Teddy Jégu
Massachusetts General Hospital
Role of Xist lncRNA in modulation of chromatin accessibilty and its impact on 3D structure of the inactive X chromosome
Teddy Jégu, Roy Blum, Chen-Yu Wang, Jeannie T. Lee
X-chromosome inactivation (XCI) is a key process that offsets the unequal gene expression between female (XX) and male (XY) mammals by silencing one X-chromosome in the female embryo. This mechanism generates an active chromosome (Xa) and an inactive X (Xi). XCI is an excellent model to study epigenetic regulation, especially to understand how long noncoding RNAs (lncRNAs) and large-scale chromosomal architecture play a role in gene regulation. Recent chromosome conformation capture approaches and methods for assaying chromatin accessibility genome-wide have revealed that the two X-chromosomes fold differently and that the Xi displays less chromatin accessible regions. The Xi chromosome is devoid of topologically associating domains structures and retains chromatin accessibility only around promoter of genes that escape the XCI and at the binding sites of the architectural protein CTCF, which is known to regulate chromosome topology. However, the molecular mechanisms involved in the modulation of chromatin accessibility during XCI process remains unclear. Our current work is focused on the role of lncRNAs in regulation of chromatin accessibility and their implications on Xi topology during XCI maintenance. By performing and integrating ATAC-seq and Hi-C experiments in Xist ablation cells, we defined different classes of chromatin accessible regions on Xi and observed how they respond to Xist deletion. Our data allowed to shed light on a new role of Xist lncRNA during XCI maintenance phase.
- Sohyun Jeong
Gachon University & Lab of genomics and epidemiology, Seoul National University
Evaluation of genetic variation associated with Immunologic component of schizophrenia risk
Sohyun Jeong, Joohon Sung
Schizophrenia has a life time morbidity and mortality and imparts a big cost burden to a society. Though its impacts to the society and individual are detrimental, current available drug therapy fall short of satisfactory efficacy. Inter-individual variation in drug response due to genetic variation is regarded as one of the reason of unsatisfactory drug efficacy. The advanced Genome Wide Association Study revealed 30 schizophrenia associated loci so far. Among those loci, enhancers active in immunologic function related tissues, CD 19 and CD20 confer strong genetic etiology of schizophrenia. Recent findings presented combination of C4 structural allele and MHC SNP haplotype is associated with schizophrenia risk. These findings are corroborated with previous epidemiologic studies which presented immunologic dysregulation of schizophrenia. However, to apply into real world medical treatment, further researches are needed. We aimed to evaluate the genetic component involving immunologic dysregulation in Korean population. First, we searched the candidate genetic variation related with immunology dysregulation in systematic literature review. Second, public gene database was used to elucidate their practical implication in further research. Finally, the selected genetic variation were searched and analyzed in public genomic data base of Family and Twin cohort study.
- Andrea Kriz
Massachusetts General Hospital/Harvard Medical School
Impact of Topologically Associating Domain (TAD) structure on Xist regulation and spreading during X chromosome inactivation
Andrea J. Kriz, John E. Froberg, Jeannie T. Lee
Female mammals must silence one of their two X chromosomes in order to balance dosage of X-linked genes with males. This process, known as X chromosome inactivation (XCI), is mediated by the long noncoding RNA (lncRNA) Xist which spreads across the inactive X and triggers heterochromatin formation along with gene silencing. Xist is in turn regulated by several other lncRNAs located in the same genomic region as Xist, known as the X inactivation center (Xic). Recent chromatin conformation studies revealed that the X chromosome, along with the rest of the mammalian genome, is organized into Topologically Associating Domains (TADs). Intriguingly, although TAD structure disappears during XCI, a sub-TAD chromatin loop containing the promoter of Xist along with its positive regulators Jpx and Ftx is retained on the inactive X. I hypothesize that this Xic sub-TAD chromatin loop is required for proper Xist expression. To test this, I propose to minimally disrupt the Xic sub-TAD loop using CRISPR/Cas9 and study the impact of these mutations on Xist expression during XCI. Outside of the Xic, recent studies suggest that TAD structure may play a role in Xist spreading. Therefore, I also propose to minimally disrupt TADs across the X chromosome using CRISPR/Cas9 and study the impact of these mutations on Xist spreading during XCI.
- Marieke Kuijjer
Dana-Farber Cancer Institute, Harvard T.H. Chan School of Public Health
Understanding tissue-specific gene regulation
Abhijeet Sonawane, John Platig, Maud Fagny, Cho-Yi Chen, Joseph Paulson, Camila Lopes-Ramos, Dawn DeMeo, John Quackenbush, Kimberly Glass, Marieke Kuijjer
Although all human tissues carry out common processes, tissues are distinguished by gene expression patterns, implying that distinct regulatory programs control tissue-specificity. In this study, we investigate gene expression and regulation across 38 tissues profiled in the Genotype-Tissue Expression project. We find that network edges (transcription factor to target gene connections) have higher tissue-specificity than network nodes (genes) and that regulating nodes (transcription factors) are less likely to be expressed in a tissue-specific manner as compared to their targets (genes). Gene set enrichment analysis of network targeting also indicates that regulation of tissue-specific function is largely independent of transcription factor expression. In addition, tissue-specific genes are not highly targeted in their corresponding tissue-network. However, they do assume bottleneck positions due to variability in transcription factor targeting and the influence of non-canonical regulatory interactions. These results suggest that tissue-specificity is driven by context-dependent regulatory paths, providing transcriptional control of tissue-specific processes.
- Caleb Lareau
Interrogation of human hematopoietic traits at single-cell and single-variant resolution
Caleb A. Lareau,* Jacob C. Ulirsch,* Erik L. Bao,* Michael H. Guo, Rany Salem, Christian Benner, Joel N. Hirschhorn, Martin J. Aryee, Jason D. Buenrostro,+ Vijay G. Sankaran+
Two outstanding challenges in the post genome-wide association study (GWAS) era are 1) the precise identification of causal variants from implicated regions and 2) inference of pertinent cell types to heritable phenotypes. In particular, cell-to-cell variation implicit in adult tissues and widespread linkage disequilibrium in the human genome represent significant hurdles to elucidating high-resolution mechanisms of inherited variation. Here, we identify 36,919 unique fine-mapped variants with > 1% causal posterior probability for a GWAS from the UK BioBank (~113,000 individuals) for 16 blood cell traits. Though most putative causal variants fall outside known protein-coding regions, we observe significant enrichments of these loci in accessible chromatin of hematopoietic cell types. Pairing our putative causal variants with single-cell epigenomic measurements for >2,000 hematopoietic progenitor cells, we propose novel approaches for resolving the genetic architecture enrichments at single-cell and single-variant resolutions. Importantly, while we focus on genetic variation in peripheral blood cell traits, the observed enrichments occur in progenitors and precursors that exist within the bone marrow, providing important insight into how blood cell production varies in humans. We observe significant heterogeneity of trait enrichment within immunophenotypically-homogenous progenitor populations, notably common myeloid progenitors and megakaryocyte-erythroid progenitors, for several erythroid-related traits. In total, our mapping of hematopoietic traits at the single-cell and single-variant resolution provides a useful framework for dissecting associations generated by GWAS.
- Erica Larschan
X-marks the spot: An essential transcription factor promotes the specific three-dimensional organization of the dosage compensated Drosophila male X-chromosome
William Jordan, III and Erica N. Larschan
Dosage compensation of X-linked genes is an essential process that equalizes transcript level between males and females. In many species, the dosage compensated X-chromosome has a different three-dimensional organization than the other chromosomes. However, the mechanism by which the specialized three-dimensional organization is generated remains unknown. We have identified an essential transcription factor, CLAMP (Chromatin-linked Adaptor for MSL proteins), that controls the specialized organization of the Drosophila male X-chromosome. Unlike the dosage compensation complex itself, CLAMP recognizes clustered GA-rich binding sequences that are enriched at the boundaries between TADs and promotes three dimensional interactions. Our work provides key insight into how the three-dimensional organization of a specialized domain of coordinated gene regulation is established.
- Jun Liang
Brigham and Women’s Hospital
Epstein-Barr virus drives cancer growth through transcriptional regulation
Over 20% of human malignancies are caused by viruses and other microbes. Enhancers are fundamental determinants of cell fate ,and often deregulation in cancer. We have identified super enhancers targeted by key EBV oncoproteins in EBV transformed B cells.By incorporating ChIA-PET data,we provide a novel view of host-pathogen chromatin interactions, further elucidating how a DNA tumor virus hijacks cellular transcriptional machinery to drive cancer growth. These results high-light the intricate virus-host chromatin interaction in tumor virus oncogenesis.
- Ruth McCole
Harvard Medical School
Ultraconservation of DNA sequence provides a new lens for focusing on chromosome rearrangements in neurodevelopmental disorders and related phenotypes
Ruth B. McCole, Wren Saylor, Claire Redin, Chamith Y. Fonseka, Harrison Brand, Jelena Erceg, Guy Nir, Michael E. Talkowski, and C.-ting Wu.
Ultraconserved elements (UCEs) are regions of the human genome that exhibit extremely high and unexplained levels of DNA sequence conservation. I have discovered that in individuals with neurodevelopmental disorders (NDDs) and related phenotypes, the breakpoints of dosage-balanced chromosome rearrangements, such as inversions and translocations, are non-randomly associated with UCE positions. This suggests that the perturbation of UCE position linearly along the chromosome and potentially in three-dimensional nuclear space, including pairing of UCEs, may play a role in NDD etiology.
More specifically, by analyzing 157 dosage-balanced chromosome rearrangement breakpoints in five studies of individuals with neurodevelopmental phenotypes, I determined that UCEs are significantly enriched near to these rearrangement breakpoints (P=2.2Õ‹10-16, observed/expected = 4.80). I replicated this finding in an independent sample of patients with neurodevelopmental and related phenotypes that detailed 453 copy number neutral chromosome rearrangement breakpoints, again finding an enrichment for UCEs near these breakpoints (P= 5.6Õ‹10-8, obs/exp = 2.43).
With these findings in hand, I am beginning to leverage super-resolution and conventional imaging technologies to visualize UCE regions corresponding to those that have been rearranged in patient genomes, with a view to delineating the chromosome structure at these regions and quantifying UCE pairing and clustering within the nucleus.
- Johannes Nuebler
Massachusetts Institute of Technology
Chromatin organization by an interplay of loop extrusion and compartment segregation
Johannes Nuebler, Geoffrey Fudenberg, Maxim Imakaev, Nezar Abdennur, Leonid Mirny
The three dimensional organization of chromatin in the cell nucleus is highly complex and intimately related to its function. Chromosome conformation capture techniques have revealed two prevalent features of organization, termed compartments and topologically associating domains (TADs), respectively. Using polymer simulations we show how the proposed mechanistic underpinnings, phase segregation for compartments and active loop extrusion for domains, shape chromatin organization. We demonstrate that the interplay of these mechanisms coherently explains several recent experimental perturbations, namely the removal of the cohesin loader Nipbl, removal of the TAD boundary protein CTCF, and removal of the cohesin unloader Wapl.
- Prashanth Rajarajan
Icahn School of Medicine at Mount Sinai
In situ Hi-C of hiPSC model of the brain reveals network of genes associated with noncoding schizophrenia risk variants
Prashanth Rajarajan, Tyler Borrman, Will Liao, Kristen Brennand, Schahram Akbarian
We performed in situ Hi-C on isogenic human induced pluripotent stem cell (hiPSC)-derived neural progenitor cells (NPCs), excitatory neurons, and glia from two male control lines. Global features of higher order chromatin organization in an hiPSC model of the brain emerged, including longer loops/larger topologically associating domains (TADs) in neurons than in other cell types and cell-type-specific loops involving genes relevant to that cell’s biology. Furthermore, by overlaying known schizophrenia (SZ) risk SNPs, we were able to map, in cell-type-specific fashion, the gene targets of noncoding variants, some of which were identified as putative regulatory elements. Then, by epigenomic editing to target the SNP-containing arm using dCas9-VP64/VPR transcriptional activators in NPCs, we show changes in expression of important neural genes located on the distal coding arm of the loop. With such an approach, we hope to expand the SZ risk network of genes beyond those simply adjacent to variants and provide a platform to test loop functionality in terms of gene expression and, eventually, resultant changes in phenotype.
Funding: P.R. is supported by 1F30MH113330-01
- Julia Rogers
Brigham and Women’s Hospital and Harvard Medical School
Dual Readout of Regulatory Information Is a Common Feature of Transcriptional Silencers
Stephen S. Gisselbrecht, Alexandre Palagi, Jesse V Kurland, Julia M Rogers, Hakan Ozadam, Ye Zhan, Job Dekker, Martha L Bulyk
A major challenge in biology is to understand how complex gene expression patterns in organismal development are encoded in the genome. While transcriptional enhancers have been studied extensively, few transcriptional silencers have been identified and they remain poorly understood. Here we used a novel strategy to screen hundreds of sequences for tissue-specific silencer activity in whole Drosophila embryos. Strikingly, 100% of the transcriptional silencers that we found were also active enhancers in other cellular contexts. We discovered more bifunctional cis-regulatory modules (CRMs) than were previously known across all biological systems (e.g., plants, Drosophila, mammals). Testing of regions for which 4C data are available identified a mesodermal silencer which makes mesoderm-specific contacts with the promoters of two genes that are not expressed in the mesoderm. We have now generated HiC data for sorted mesoderm and nonmesodermal cells and are in the process of analyzing the contacts made by all of the bifunctional elements we’ve identified. CRM bifunctionality complicates the understanding of how gene regulation is specified in the genome and how it is read out differently in different cell types. Characterization of bifunctional elements should aid in investigations of how precise gene expression patterns are encoded in the genome.
- Ahilya Sawh
Dynamic Organization of C. elegans Chromosomes During Development by Multiplexed DNA FISH
Ahilya Sawh, Siyuan Wang, Xiaowei Zhuang, and Susan Mango
Chromosomal DNA is spatially arranged in the nucleus at different length scales – from gene clusters, to topologically associating domains (TADs), to separate compartments of active/inactive chromatin – but little is known about how these structures are generated during development. To define chromatin structure in embryogenesis, we have adapted multiplexed DNA FISH (Wang et al. 2016) for embryos and determined the 3D conformation of whole chromosomes from the 2-cell stage to gastrulation. We generated chromosome architecture maps, described the biophysical properties of the chromosome, and showed that domains rearrange during embryogenesis. Surprisingly, in the early embryo, chromosomes are arranged in two mega-domains, distinct from the compartments identified by HiC in differentiated cells (Crane et al. 2015). Heterochromatin is thought to be a highly compacted state, however we find that TADs destined to become heterochromatin are unexpectedly distended in space, while those bearing euchromatic features are compacted. We are currently examining the contributions of the parental germ-line vs. zygotic events to generate and remodel the mega-domain pattern of the early embryo.
- Wren Saylor
Harvard Medical School
Towards a Complete Picture of Sequence Composition and Nuclear Positioning in Ultraconserved Elements (UCEs)
Wren Saylor, Ruth B. McCole, and C.-ting Wu
Ultraconserved elements (UCEs) are genomic sequences that show 100% identity between reference genomes of distantly related species and, as this level of identity is unwarranted for any known function, they have been a longstanding puzzle in the field of genomics. To better understand UCEs, we are exploring the possibility that they constitute a novel type of genetic element that functions to maintain genomic integrity via a homology based pairing mechanism, wherein disruptions of UCE sequence, dosage or pairing lead to loss of fitness. Consistent with this model, we have shown an extreme depletion of UCEs from copy number variation and that such dosage imbalance is highly disfavored in healthy cells as compared to their cancerous counterparts. Here, we examine sequence composition in order to determine whether UCEs bear sequence properties that distinguish them from both their surrounding regions and other genomic elements. In addition to a bioinformatics approach, we have begun imaging UCEs with the intent to assess the frequency of co-localization of UCEs on homologous chromosomes. Overall, our goal is to build a comprehensive characterization of UCEs as genetic elements, gain insight into the mechanisms of ultraconservation, and interrogate our hypothesis that UCEs serve as stewards of genomic integrity.
- Chen-Yu Wang
Massachusetts General Hospital
SMCHD1 assists formation of super-structures on the inactive X chromosome
Chen-Yu Wang, Hsueh-Ping Chu, and Jeannie T. Lee
X chromosome inactivation is a great model to study the regulation of chromosome conformation. This process is controlled by Xist RNA, which triggers gene silencing and structural reorganization of the inactive X chromosome (Xi). Mammalian chromosomes are normally partitioned into regional chromatin structures, including “A/B compartments” and “topologically associated domains.” However, the Xi is reorganized into two “mega-domains,” where these finer structures become undetectable. The mechanism and functional significance of this reorganization is not known. Here, we reveal a function for structural maintenance of chromosomes flexible hinge domain containing 1 (SMCHD1), a protein enriched on the Xi, in assisting the formation of the Xi’s unique structure. SMCHD1 functions as an epigenetic repressor; however, its mechanism of repression remains unclear. Interestingly, the molecular architecture of SMCHD1 resembles structural maintenance of chromosomes proteins, a protein family crucial for chromosome structures. We therefore hypothesize that SMCHD1 may serves as an architectural factor. To test this hypothesis, we performed in situ Hi-C on cells depleted with SMCHD1. Our results suggest a role of SMCHD1 in regulating the topological organization of the Xi.
- Hua-Jun Wu
Dana Farber Cancer Institute
Topological isolation of protein-coding genes in control of stem cell differentiation and tumorigenesis
Hua-Jun Wu and Franziska Michor
Topologically associating domains (TADs) are constitutive chromosomal structures that constrain transcriptional regulation of the mammalian genome. Despite many investigations, little is known about how different functional gene groups are organized into individual TADs. Here we analyzed Hi-C data of 31 human tissue and cell lines to identify different TAD groups defined by the number of protein-coding genes they contain. We then analyzed the functions of genes belonging to different groups, finding that genes in single-gene TADs, termed topologically isolated genes (TIGs), are uniquely involved in stem cell differentiation and tumorigenesis. We also discovered that TIGs are stably isolated across different cell types, and that the isolation structure is more conserved across species than that of multiple gene TADs. Furthermore, we observed a mutually exclusive pattern between single gene TADs and TADs harboring housekeeping genes in their boundaries. These results were consistent when using CTCF/cohesion-mediated loops instead of TADs to define TIGs. Our findings reveal a previously unknown link between distinct chromosomal organization and gene function, and suggest that essential genes were organized into topologically isolated genomic regions during evolution to facilitate the accurate regulation of gene functionality.
- G. Gürkan Yardımcı – Stellar Abstract Award*
University of Washington
Dynamics of 3D genome organization during blood cell maturation
Galip Gürkan Yardımcı, Jenny Mao, Choli Lee, Raymond T. Doty, Jay Shendure, Brent L. Wood, Janis L. Abkowitz, William S. Noble, Zhijun Duan
During the development of mature blood cells from hematopoietic stem cells, nuclei are altered dramatically to fulfill specific functions. Nuclei of red blood cells are condensed and finally expelled to maximize hemoglobin carrying ability, whereas nuclei of granulocytes divide into multiple lobes at the end of development. To investigate the effects of these morphological alterations on the organization of chromatin, we performed DNase Hi-C experiments on FACS-isolated differentiating blood cells from human donor bone marrows at multiple stages of development. Both red and white cell lineages maintain well established features of chromatin organization. A/B compartments and topologically associating domains are stably maintained during differentiation, except for compartmental switches around relevant loci such as the beta-globin locus. However, frequently interacting regions (FIREs), enhancer-like regions identified from Hi-C matrices, are organized in a temporal and lineage specific manner around functionally crucial genes, such as the heme-synthesis pathway for red blood cells. Indeed, FIREs are enriched around differentially expressed genes, exhibiting fine scale organization of transcriptional control. Interestingly, nuclear morphological changes cause chromatin condensation, resulting in an enrichment of long range Hi-C interactions. Furthermore, mature granulocytes with lobular nuclei lose their chromosome territories, suggesting random dispersion of chromosomes into nuclear lobes.
- Linying Zhang
Dana Farber Cancer Institute
Interactions between multiple myeloma cells and bone marrow stromal cells impact epigenetic profiles of multiple myeloma
Linying Zhang, Mehmet K. Samur, Raphael Szalat, Charles B. Epstein, Rao Prabhala, Mariateresa Fulciniti, Nikhil C. Munshi*, Giovanni Parmigiani* *Contributed equally to this work
Multiple myeloma (MM) while in contact with its bone marrow microenvironment, demonstrates distinct characteristics in cell proliferation, migration and drug resistance. Discovering the role of this protective tumor-environment interaction in altering the epigenome of both multiple myeloma and bone marrow stromal cells (BMSC) can provide insights into MM pathogenesis and new target for treatment. Our mint-ChIP sequencing analysis of two histone modifications, H3K27ac and H3K4me3, revealed 20,000-40,000 significantly differential binding sites (DB) genome-wide in 3 multiple myeloma cell lines (FDR<0.05) co-cultured with stroma. For H3K27ac and H3K4me3 respectively, 168 and 615 differential binding sites with at least 2-fold difference in enrichment were found in common among 3 cell lines (FDR<0.05). Based on functional annotation and GO analysis of H3K27ac DBs, genes potentially under regulation of DBs were involved in cell proliferation and survival pathways (for example, NF-B signaling pathway, MAPK/ERK pathway and STAT5 pathway). In addition, genes mediating the adhesion of multiple myeloma cells to the bone marrow (for example, NCAM1, integrins), and anti-apoptosis genes (MCL1, HSPs and IGF1R) were also potentially regulated by DBs. In conclusion, we hypothesize that epigenetic alterations of MM, triggered by interactions with BMSCs, stimulate proliferative and anti-apoptotic signaling cascades in multiple myeloma cells.
- Zhaozhong Zhu
Harvard T.H.Chan School of Public Health
Shared Genetic Architecture between Asthma and Allergic Diseases: A Genome-Wide Cross Trait Analysis of 112,551 Individuals from UK Biobank
Zhaozhong Zhu, Phil H. Lee, Mark D. Chaffin, Wonil Chung, Po-Ru Loh, Quan Lu, David C. Christiani, Liming Liang
Clinical and epidemiological data suggest that asthma and allergic diseases are associated. And may share a common genetic etiology. We analyzed genome-wide single-nucleotide polymorphism (SNP) data for asthma and allergic diseases in 35,783 cases and 76,768 controls of European ancestry from the UK Biobank. Two publicly available independent genome wide association studies (GWAS) were used for replication. We have found a strong genome-wide genetic correlation between asthma and allergic diseases (rg = 0.75, P = 6.84Õ‹10-62). Cross trait analysis identified 38 genome-wide significant loci, including novel loci such as D2HGDH and GAL2ST2. Computational analysis showed that shared genetic loci are enriched in immune/inflammatory systems and tissues with epithelium cells. Our work identifies common genetic architectures shared between asthma and allergy and will help to advance our understanding of the molecular mechanisms underlying co-morbid asthma and allergic diseases.
SESSION III: Computational Challenges
- Kasper Hansen
Johns Hopkins University
Distance-dependent between-sample normalization and batch effect correction for Hi-C experiments
Kipper Fletez-Brant, David Gorkin, Yunjiang Qiu, Ming Hu, Bing Ren, Kasper D. Hansen
Hi-C data is commonly normalized using single sample processing methods, with focus on spatial comparisons of contact matrices. Here, we focus on comparisons of individual contacts between different samples, and demonstrate the existence of unwanted variation in Hi-C data on multiple biological replicates. This unwanted variation changes across the contact matrix. We present BNBC, a method for normalization and batch correction of Hi-C data. We show it substantially improves comparisons across samples, with the variation explained by batch changing from 57% to 7%, while preserving structural features of the contact matrices.
- Jialiang Huang
Dana Farber Cancer Institute
Dissecting super-enhancer hierarchy based on chromatin interactions
Jialiang Huang, Kailong Li, Wenqing Cai, Xin Liu, Yuannyu Zhang, Stuart H. Orkin, Jian Xu, Guo-Cheng Yuan
Recent studies have highlighted super-enhancers (SEs) as important regulatory elements for gene expression, but their intrinsic properties remain incompletely characterized. Through an integrative analysis of Hi-C and ChIP-seq data, we find that a significant fraction of SEs are hierarchically organized, containing both hub and non-hub enhancers. Hub enhancers share similar histone marks with non-hub enhancers, but are distinctly associated with cohesin and CTCF binding sites and disease-associated genetic variants. Genetic ablation of hub enhancers results in profound defects in gene activation and local chromatin landscape. As such, hub enhancers are the major constituents responsible for SE functional and structural organization.
- Yan Kai
The George Washington University
Predicting CTCF-mediated long-range interactions using genetic and epigenetic features
Yan Kai, Jaclyn Andricovich, Zhouhao Zeng, Alexandros Tzatsos, Weiqun Peng
The CCCTC-binding zinc finger protein (CTCF) and CTCF-mediated interactome have been shown to play important roles in genome organization and gene expression. Although CTCF-mediated long-range interactions are deemed to be largely conserved, we show that they exhibit significant variations across cell-types. We also demonstrate that cell-type-specific interactions are functionally important, as they are linked to genes and super-enhancers uniquely active in individual cell types contributing to cell identity. However, experimental explorations on CTCF-mediated interactome are hampered by high cost or technical challenges. Here we present Lollipop—a machine-learning framework—to predict CTCF-mediated long-range interactions using genetic and epigenetic features. Using ChIA-PET data as benchmark, we demonstrate that Lollipop performs well in predicting CTCF-mediated loops both within and across cell-types, and outperforms previous methods. Moreover, our approach reveals that loop anchors tend to have matching strength in architectural proteins binding, a feature previously under-appreciated in CTCF-mediated chromatin wiring. Our study contributes to understanding the principles underlying CTCF-mediated long-range interactions and 3D genome architecture and their impact on gene expression.
- Kyogo Kawaguchi – Stellar Abstract Award*
Harvard Medical School
Wetting of chromatin string as a model of cell state switching
Kyogo Kawaguchi, Allon M. Klein
A key question in cell fate commitment is how the epigenetic states switch discretely and almost irreversibly. Recently, a protein that interacts with heterochromatin was discovered to form liquid droplets by phase-separating from water, provoking the idea that encapsulating specific genome regions in droplets might be the physical basis for the creating stable cell states. Here we model this situation as the wetting transition on a string, i.e., a phase separation nucleated on heterochromatin. We show that the order of this transition will depend on the fractal dimension of the chromatin polymer configuration, meaning that stable, switch-like and irreversible state changes can be driven by droplet formation. We hypothesize that the parameters realized in the cell nucleus are within the first-order phase transition regime, fitting with the phenomenological features of cell state switching. We also consider scenarios where random polymers collapse due to the wetting, and distant heterochromatin regions merging into pre-existing droplets. Implications of the theory and possible experiments are discussed.
- Soohyun Lee
Harvard Medical School
Pairs and Pairix: a standard format and random access tools for chromatin contact lists
Soohyun Lee, Carl Vitzthum, Burak Han Alver*, Peter J. Park* (*Corresponding authors)
At the highest resolution, genomic interactions identified in chromatin conformation capture experiments (e.g., Hi-C) are represented as a list of pairwise contacts. Currently, no standard data format exists for sharing these contact lists across different tools, and no standard tool exists to provide random access and rapid query. We describe a flexible text-based format Pairs along with a new indexing and query tool Pairix developed for contact lists.
The Pairs format (.pairs) is an expandable tab-separated text format compressed using bgzip. The Pairix program is adapted from Tabix to allow indexing per chromosome-pair and to perform 2D queries. A query can be a thousand-fold faster when using an index, for a file that contains a billion entries. Python and R bindings as well as additional utilities for file format conversion and merging are also provided.
The separately available pairsQC package (https://github.com/4dn-dcic/pairsqc) can be used to calculate standard QC metrics and distributions from a .pairs file and to generate html reports. Pairs files can be read by several popular Hi-C analysis tools including Cooler and Juicer to generate aggregated contact matrices. The pairs format specification and Pairix are available on https://github.com/4dn-dcic/pairix.
- Qunhua Li
Penn State University
Assessing the reproducibility of Hi-C data
Tao Yang, Feipeng Zhang, Galip Gürkan Yardımcı, Fan Song, Ross C. Hardison, William Stafford Noble, Feng Yue, Qunhua Li
Hi-C is a powerful technology for studying genome-wide chromatin interactions. However, current methods for assessing Hi-C data reproducibility can produce misleading results because they ignore spatial features in Hi-C data, such as domain structure and distance dependence. In this talk, I will present HiCRep, a framework for assessing the reproducibility of Hi-C data that systematically accounts for these features.
In particular, we introduce a novel similarity measure, the stratum adjusted correlation coefficient (SCC), for quantifying the similarity between Hi-C interaction matrices. Not only does it provide a statistically sound and reliable evaluation of reproducibility, SCC can also be used to quantify differences between Hi-C contact matrices and to determine the optimal sequencing depth for a desired resolution. The measure consistently shows higher accuracy than existing approaches in distinguishing subtle differences in reproducibility and depicting interrelationships of cell lineages. The proposed measure is straightforward to interpret and easy to compute, making it well-suited for providing standardized, interpretable, automatable, and scalable quality control.
- Xihao Li
Harvard T.H. Chan School of Public Health
Multivariate mixed models for predicting functional regions in the human genome
Xihao Li*, Godwin Yung*, Hufeng Zhou, Iuliana Ionita-Laza, Xihong Lin
Since the completion of the Human Genome Project, substantial effort has been put into identifying and annotating its functional DNA elements. With no universal definition of what constitutes function, we now have for any genetic variant, whether protein coding or noncoding, a diverse set of functional annotations. In order to obtain a comprehensive picture of the biological relevance of genomic segments, all of the information acquired by the different annotations need to be taken into account. Current machine-learning methods focus on predictive accuracy of the annotations. However, they seldom take into account correlations between the functional scores. We propose the combined annotation mixed model (CAMM), an unsupervised learning algorithm that integrates multiple annotations. Our model defines functional status as a vector of binary variables, each meant to capture functionality defined by a specific group of annotations, e.g. evolutionary conservation. It also allows for correlations within and between the groups of annotations. Using the EM algorithm, our approach calculates the posterior probability of a genomic position being functional. We compare the prediction performance of CAMM with existing supervised/unsupervised methods for both coding and non-coding variants in Clinvar, GWAS Catalog, eQTL and several other databases.
- Kathleen Metz
University of North Carolina – Chapel Hill
Simulating Hi-C Data for Benchmarking Differential Loop Detection Algorithms
Kathleen S. Metz, Douglas H. Phanstiel
DNA loops connect regulatory regions to genes hundreds of thousands of base pairs away in order to establish proper transcriptional profiles during human development. Over the past decade there have been rapid improvements in DNA sequencing technology and computing power which allow us to generate high-resolution data and detect these loops. Several methods to detect differential loops have been presented, however the relative accuracy and sensitivity of these methods is unclear. In order to evaluate and compare new and existing methods, we must first be able to accurately simulate Hi-C contacts, with known and controllable structural features.
Using attributes from existing data from two different cell types, we have modeled contact frequency and variance as a function of genomic distance, and used these functions to construct artificial contact matrices and selectively remove loops. These simulations provide the first steps towards a tunable framework for precise evaluation and comparison of algorithms to detect differential 3D chromatin features.
- Zhengqing Ouyang
The Jackson Laboratory for Genomic Medicine
Statistical modeling for 3D structure reconstruction from genome-wide chromatin conformation capturing data
Chenchen Zou, Yuping Zhang, Zhengqing Ouyang
Recent chromosome conformation capture technologies (such as Hi-C) have been widely used for 3D characterization of the genome. While massive, complex data sets have been generated using Hi-C and related technologies, few statistical approaches exist for effectively modeling the higher-order structure of the genome. We introduce a model-based approach for reconstructing 3D chromatin structure from Hi-C data. Through analysis of diverse cell types and organisms, we demonstrate accurate and robust reconstruction of 3D chromatin structure at high resolution and genome-wide scale.
- Brian Ross
University of Colorado
Measuring chromosome conformation by fluorescence microscopy without barcoding
Brian C Ross, James Costello
How to directly measure an in-vivo, single-cell chromosome conformation is an outstanding problem in structural biology. Whereas global conformational information can be inferred from DNA-DNA contact frequencies obtained using 3C-derived methods, direct measurements of individual chromosomal positioning and locus interactions using fluorescence microscopy are limited to a very few loci that can be distinguished by color. One possible route to obtaining large-scale conformations directly by microscopy is to label many more loci than can be distinguished by color, and then computationally infer the identity of each imaged locus using the known color ordering and spacing of the labels along the chromosome. Here we report on improvements to one such reconstruction algorithm, and present experimental validation of the method from a 3-color in-situ hybridization (FISH) labeling of 10 loci on a 4 MB stretch of human chromosome 4. Our results show that we can both generate likely conformations as well as give unbiased statistical measures of the reconstruction quality.
- Andres Saez
University of South Florida
Gene Network Inference from mRNA Expression Levels
Andres Saez, Dr. Jeffrey Miller
It is possible for us to obtain genome-wide responses from thousands of precise interventional experiments at once. In light of this, it is necessary to develop a computational method which not only allows for analysis of these large data sets, but also is able to inform experimenters of the best way in which to proceed in their experiments. Some models are available, but generally, these are either not amenable to large data sets or are not informative to exeperimenters. We develop the theory behind a computational framework which meets both of these criteria. We also show some preliminary results on the computational side with both in silico test networks as well as some biological data sets. Our computational framework is also modular – with some a priori knowledge of the genes involved, it is possible to analyze a particular group of genes without the added overhead of analysis for genes which are not of interest.
- Jiantao Shi
Dana Farber Cancer Institute
DNA Methylation Profiling of cfDNA for Noninvasive Early Cancer Detection
Jiantao Shi, Zachary D. Smith, hongcang Gu, Julie Donaghey, Kendell Clement, Davide Cacchiarelli, Andreas Gnirke, Alexander Meissner and Franziska Michor
It has been known for many years that DNA methylation landscape of cancer is characterized by global hypo-methylation and CpG Island (CGI) hyper-methylation. Strikingly, our recent study shows that DNA methylation landscape of cancer is very similar to that in extraembryonic tissues (Smith and Shi, Nature, 2017). We defined differentially methylated CGIs by comparing DNA methylation of extraembryonic ectoderm (ExE) to Epiblast (Epi). ExE hyper-methylated CGIs could distinguish cancer from normal in the TCGA cohort consisting of 14 cancer types and CLL. It is thus tempting to use this signature for pan-cancer diagnosis. We now aim to create an early diagnostic tool that will enable us to detect individual cancer types based on a DNA methylation signature present in a patient’s cell free DNA (cfDNA) long before clinical presentation. It has been shown that in early stages of cancer, circulating tumor DNA (ctDNA) represents only 0.01-1% of cfDNA in plasma. Diagnosis such cancers requires nearly zero background. However, normal cells acquire low-level methylation (~ 1%) due to stochastic processes, when measured at single CpG site. We have developed a computational pipeline that could predict early cancer with cfDNA as low as 0.01%, by using DNA methylation haplotypes.
- Abhijeet Sonawane
Brigham and Women’s Hospital
Constructing distal gene regulatory networks using epigenetic data
Abhijeet Sonawane and Kimberly Glass
The biological processes that drive cellular function can be modeled by a complex network of interactions between regulators (transcription factors) and their targets(genes). One critical influence on these interactions is a cell’s “epigenetic state”, or whether the regulatory region of a gene is in open chromatin region and physically accessible by a transcription factor in that cell. In our analysis we estimate networks between transcription factors and genes based on proximally- and distally- located open chromatin regions (determined from Dnase-I Seq data) in several different types of cells. We then benchmarked the accuracy of these networks using independent (ChIP-seq) data. In particular, we develop SPIDER (Seeding PANDA Interactions to Derive Epigenetic Regulation), to effectively integrate epigenetic information into PANDA (Passing Attributes between Networks for Data Assimilation), a message-passing approach to network reconstruction. Using SPIDER, we observe a drastic improvement in network accuracy. Further investigation suggests that the epigenetic state of network interactions is being exploited by SPIDER to eliminate spurious links (false-positives) in the networks. Functional enrichment analysis also demonstrates that SPIDER is highlighting interactions that are biologically relevant to each cell-type. Our algorithm works by highlighting common structures across multi-omics networks, a process we are able to model using an innovative form of network motifs based on consistency loops.
- Richard Tourdot
Dana Farber Cancer Institute
Haplotype-resolved analysis of chromosomal rearrangements in cancer
Richard Tourdot, Cheng-Zhong Zhang
Cancer genomes often contain extensive chromosomal rearrangements, copy-number alterations, and whole-chromosome aneuploidies. Resolving the global organization of chromosomal rearrangements in cancer genomes can provide crucial information on the history of genome evolution and shed light on the mechanisms of these alterations. Although short-read sequencing can probe these alterations locally at individual loci, how these alterations are organized in the rearranged chromosomes cannot be inferred directly. Long-read sequencing can generate linkage information (“phasing”) between multiple adjacent genetic variants. But current technologies, including PacBio Single-Molecule Real-Time (SMRT) sequencing and Oxford Nanopore sequencing, have a high frequency of sequencing errors (10-15%), creating uncertainty in the phasing information. Here we describe a computational method to overcome these errors and accurately infer long-range haplotypes from long reads or long-range linked reads. This method can be further generalized to incorporate read depth information and jointly determine allelic DNA copy number and haplotype. We report a preliminary analysis of the structure of rearranged chromosomes in a breast cancer cell line (HCC1954) by combining allelic copy number with chromosome rearrangement analyses.
- Su Wang
Harvard Medical School
HiNT: a computational method for using Hi-C data to detect copy number variations and translocations
Su Wang, Jennifer Walsh, Dhawal Jain, Soohyun Lee, Burak H Alver, Peter J Park
Chromatin conformation capture with high-throughput sequencing (Hi-C) is designed to profile the three- dimensional conformation of a genome. However, copy number variations (CNVs) and chromosomal rearrangements can confound the interpretation of chromosomal interactions in Hi-C data. Here, we introduce HiNT (Hi-C for copy Number variations and Translocations detection), a computational approach to detect CNVs and translocations from Hi-C data. To detect CNVs, HiNT reduces the two-dimensional interaction map to a one-dimensional coverage profile, using a linear regression model to remove the biases due to GC content, mappability, and the fragment length. A segmentation algorithm is then used to identify CNVs. For detecting translocations, HiNT first estimates the background contact frequency profile by averaging multiple Hi-C maps in normal cells to remove the compartment signals, and then identifies inter-chromosomal pairs that contain regions where contact probabilities are exceptionally high as translocation candidates. Translocation breakpoints are further resolved with single-base pair resolution using non-Hi-C chimeric reads. We report that Hi-C data can be used for detection of CNVs and translocations, and can supplement whole genome sequencing by locating translocation breakpoints in repetitive regions, which are known to be genomic rearrangement hotspots. With these functionalities, HiNT can be applied to quality control of Hi-C data, as well as CNV and translocation identification.
- Zhenjia Wang
University of Virginia
Functional Transcription Factor Prediction using BART
Zhenjia Wang, Chongzhi Zang
Identification of functional transcription factors (TFs) for a set of co-regulated genes is an essential problem in transcriptional regulation studies. Conventional approaches for TF identification such as DNA sequence motif analysis are not able to predict functional binding of specific TFs, and are especially difficult for most TFs that bind DNA at distal enhancer regions. With the advent of ChIP-seq technique, large amounts of genome wide TF binding profiles are available in the public domain. Here we present Binding Analysis for Regulation of Transcription (BART), a novel computational method for predicting functional TFs that regulate gene expression in human or mouse genomes. Following a genomic cis-regulatory profile predicted by MARGE, a previous-developed semi-supervised learning approach for predicting cis-regulatory profiles from a given gene set, BART predicts TFs whose genomic binding profiles are best associated with the cis-regulatory profile, leveraging 3485 publicly available TF ChIP-seq datasets in human and 3055 in mouse. BART can accurately predict functional TF from differentially expressed genes under that TF perturbation. Comprehensive TF predictions from hundreds of MSigDB gene sets characterize the functional associations across different TFs in the human genome. This study demonstrates the advantage of utilizing public ChIP-seq datasets in functional gene regulation research.
- Bo Zhang – Stellar Abstract Award*
Penn State University
HiCPlus: a deep convolutional neural network for Hi-C interaction matrix enhancement
Yan Zhang, Lin An, Jie Xu, Bo Zhang, W. Jim Zheng, Ming Hu, Jijun Tang, Feng Yue
Hi-C technology is one of the most popular tools for measuring the spatial organization of mammalian genomes. Although an increasing number of Hi-C datasets have been generated in a variety of tissue/cell types, due to high sequencing cost, the resolution of most Hi-C datasets are coarse and cannot be used to infer enhancer-promoter interactions or link disease-related non-coding variants to their target genes. To address this challenge, we develop HiCPlus, a computational approach based on deep convolutional neural network, to infer high-resolution Hi-C interaction matrices from low-resolution Hi-C data. Through extensive testing, we demonstrate that HiCPlus can impute interaction matrices highly similar to the original ones, while using only as few as 1/16 of the total sequencing reads. We observe that Hi-C interaction matrix contains unique local features that are consistent across different cell types, and such features can be effectively captured by the deep learning framework. We further apply HiCPlus to enhance and expand the usability of Hi-C data sets in a variety of tissue and cell types. In summary, our work not only provides a framework to generate high-resolution Hi-C matrix with a fraction of the sequencing cost, but also reveals features underlying the formation of 3D chromatin interactions.
- Yuping Zhang
University of Connecticut
A statistical framework for longitudinal genomic data integration
Recent advances in genomic medicine resulted in accumulated longitudinal genomic data, where patients were monitored over time, their biological samples were collected at multiple time points, and the corresponding molecular profiles were measured through high-throughput assays. Longitudinal high-dimensional datasets from genomic and biomedical research may have complicated correlation structures or irregular covariance structures. Consequently, such characteristics raise challenges on dimension reduction and feature selection of longitudinal genomic data. We present a new statistical method to integrate multiple sources of information for better knowledge discovery in diverse dynamic biological processes. We demonstrate the utility of our method through simulations and applications to gene expression data of the mammalian cell cycle and longitudinal transcriptional profiling data in response to influenza viral infections.
- Ye Zheng – Stellar Abstract Award*
University of Wisconsin – Madison
Statistical Methods for Profiling 3-Dimensional Chromatin Interactions from Repetitive Regions of Genomes
Ye Zheng, Ferhat Ay, Sunduz Keles
Recently developed chromatin conformation capture-based assays enabled the study of three-dimensional chromosomal architecture in a high throughput fashion. Hi-C, particularly, elucidated genome-wide long-range interactions among loci.
Although the number of statistical analysis methods for Hi-C data is growing rapidly, a key impediment is their inability to accommodate reads aligning to multiple locations, i.e., multi-mapping reads. This is a key shortcoming of current Hi-C analysis pipelines and hinders the comprehensive investigation of both intra-chromosomal and inter-chromosomal interactions involving repetitive regions.
We developed mHi-C, a multi-mapping strategy for Hi-C data, that integrates a hierarchical model to probabilistically allocate multi-mapping reads to their most likely genomic interaction positions. The hierarchical model is built on clustering of sequencing reads that represent biological signal and acknowledge the general features of Hi-C data. Application on published Hi-C data with varying sequencing depth illustrates an average 20% increase in the number of usable reads, leading to higher reproducibility of contact matrices, and also demonstrates that a large fraction of novel significant contacts originates from heterochromatin genomic regions. Further analysis of newly detected contacts for potential enhancer-promoter interactions highlights the importance of long-range contacts with repetitive structures. We implemented mHi-C in Python as a flexible and robust pipeline.