Past Bioinformatics Core Forums:
Watch archived presentations
11/10/09 - Fritz Roth, "A Systems Genetics Sampler"
9/22/09 - Curtis Huttenhower, "Data mining for functional genomics and metagenomics"
Tuesday November 10, 2009
FXB G12, 12-1:30pm
Fritz Roth - Associate Professor,
Tutor in Biochemical Sciences
Department of Biological Chemistry and Molecular Pharmacology
Harvard Medical School
"A Systems Genetics Sampler"
The talk will survey several topics:
- A quick update on large-scale quantitative function annotation for human genes;
- How a computational analysis of human 5'UTR introns led us to find that mRNAs encoding mitochondrial proteins use a non-canonical mRNA export pathway;
- Identifying synergistic drug combinations by mining genetic interaction networks; and
- Barcode Fusion Genetics (BFG), a new technology that identifies genetic interactions via large-scale parallel sequencing.
Friday October 23, 2009
FXB G11, 12-1:30pm
Susanna-Assunta Sansone - Coordinator, with
Philippe Rocca-Serra, Technical Coordinator &
Eamonn Maguire, Software Engineer
The European Bioinformatics Institute
Cambridge, UK
"Standards and Infrastructure for Managing Experimental Metadata"
The presentation has a two-fold objective: to highlight the role of community-defined synergistic standards and introduce the development of the Investigation, Study and Assay (ISA) Infrastructure. This promotes and enables uptake of the standards through the provision of a set of freely available tools and a database, facilitating and assisting in the reporting and management of experimental metadata from a variety of multi-omics studies. The ISA infrastructure' components are based on the ISA-Tab format and designed for local use and can work independently, or as unified system:
- ISAcreatorConfig, for curators or power users to regulate the fields displayed in the ISAcreator; i.e., declaring certain fields mandatory (http://www.mibbi.org) or mandating the use of a specific set of ontology terms (http://www.obofoundry.org).
- ISAcreator, a 'user-friendly' editor with which experimentalists can construct reports, edit experimental metadata and ultimately validate it based on the configuration specified;
- The BioInvestigation Index, a relational database for storing and browsing the experimental metadata (an example is running as public prototype at: http://www.ebi.ac.uk/bioinvindex);
- ISAconverter, to transform ISA-Tab metadata into formats for submission to ArrayExpress (MAGE-Tab), PRIDE (Pride-xml) or the European Nucleotide Read Archive (SRAxml).
Dawn D*, Sansone SA*, Collis A*, ... Rocca-Serra P et al. ‘Omics data sharing. Science 9 Oct 2009. Vol. 326. no. 5950, pp. 234 - 236. http://biosharing.org
ISA software and contact: http://isatab.sourceforge.net
This work is supported by funds from the EU (CarcinoGenomics, NuGO), EMBL-EBI, UK's NERC-NEBC and the BBSRC.
September 22, 2009
FXB G12, 12-1:30pm
Curtis Huttenhower, Ph.D. - Assistant Professor
Computational Biology and Bioinformatics
Department of Biostatistics
Harvard School of Public Health
"Data mining for functional genomics and metagenomics"
Bioinformatics in the context of public health is needed at a wide range of biological scales: molecular data describing cellular function, population studies incorporating genomic data, and the systems biology tying together these extremes. At all of these levels, the scale of available data is large; public repositories of genomic data currently contain billions of experimental results from a variety of assays. While modern search engines have organized the size and heterogeneity of other complex systems such as the Internet, it remains an open question how machine learning can be used to mine large genomic data collections for answers to specific biological questions.
Curtis will discuss two algorithmic approaches to large scale human genomic data integration, both of which leverage tens of thousands of datasets to predict interaction networks, disease linkages, and regulatory modules. He will also present preliminary results applying this methodology to study genetic and epigenetic variation in a ~1,000-subject colorectal cancer cohort. Finally, he will briefly discuss data integration in the context of metagenomics, the study of uncultured microorganisms from environmental samples. This emerging data-rich field presents a unique opportunity to bring large scale data integration to bear, particularly in the context of human microflora and their impact on health within hosts and across populations.
May 19, 2009
Deparment of Oncology
Harvard Medical School
Dana-Farber Cancer Institute
"Breast Tumor Evolution"
Breast tumors are heterogeneous and composed of a variety of cell types with distinct genetic, epigenetic, and phenotypic profiles. The molecular basis underlying this intra-tumoral heterogeneity is poorly defined. Models that attempt to explain this include genetic and epigenetic diversity and stem cell-like characteristics combined with environmental selection for the most favorable phenotypes. These ideas have been investigated for a long time both in human tumors and in various model systems, leading to the accumulation of numerous findings that are used to support one or the other. Increasing data suggest that the cancer stem cell phenotype may just be a consequence of genetic and epigenetic events that occur in tumor cells and that it may change as tumors evolve. This high degree of intra-tumoral heterogeneity poses a challenge for efficient cancer therapy and prevention of disease progression. Identification the dependency of distinct tumor cell subpopulations on specific signaling pathways and developing combination of agents selectively targeting each of these likely to lead to the improved clinical management of cancer patients.
- Kornelia Polyak, Michail Shipitsin, Noga Qimron, Lauren L. Campbell
MARCH 17, 2009
Chief Scientific Officer, New England Biolabs
“The Genomics of Restriction and Modification"
With more than 900 bacterial and archaeal genomes completely sequenced and the total sequence content of GenBank still growing exponentially, we can now gain some impression of the distribution of RM systems in the real world. This has been accomplished by using computational analysis of these sequences to find genes or remnants of genes that show clear similarity to known restriction systems in REBASE. This approach works well in identifying Type I and III systems, which show good conservation of sequence similarity. For the Type II systems the V and C genes that accompany these systems are easily identified as are the methyltransferase genes. However, the R genes are only detectable when they match known R genes of the identical or closely related specificity. New R genes show up only as genes lying close to an M gene and themselves having no similarity to any other genes in GenBank.
Surprisingly, these RM systems, or the relics of them, are much more abundant than might have been guessed from the classical biochemical screening of strains in the laboratory. In particular, Type I systems are widely distributed in Nature and many instances of solitary specificity subunits are found. More than 400 potential Type III and 700 Type IV systems are found and on average about 4 DNA methyltransferase genes are found per genome. Apparently solitary M genes, in which the R gene is either missing or non-functional, seem quite common. However, our ability to identify M genes accurately is made difficult by the presence of conserved motifs in genes that methylate molecules other than DNA. Analyses of the many environmental samples now appearing in GenBank suggests that the rate of evolution of both M and R genes is quite high and confirms previous findings that the direct cloning of intact RM systems into E. coli is quite difficult with current technology. Importantly, there is little reason to think that our current collection of more than 270 Type II specificities is more than a small sample of the specificities present in Nature.
New methods for predicting active restriction enzymes will be discussed as well as some new experimental approaches to testing the computational predictions.
FEBRUARY 24, 2009
Gabor T. Marth, D.Sc. - Assistant Professor
Department of Biology
Boston College
"Informatics Tools for Next-generation Sequence Analysis"
Next-generation sequencing technologies are now capable of producing tens of gigabases of useful data per machine run. This vast throughput led to the sequencing of several complete individual human genomes, and the 1000 Genomes Project is sequencing thousands of more individuals. The primary utility of these datasets is to discover single-nucleotide polymorphisms (SNPs) and short insertion-deletions (INDELs) at the single base pair resolution; and to map out structural variations (e.g. tranlocations, inverions) and copy number changes (e.g. deletions, duplications).
Current throughput and cost is sufficiently low to sequence smaller genomes with a fraction of a machine run’s worth of data. This enables whole-genome mutational profiling of model organisms and of pathogenic eukaryotes that are inaccessible with traditional genetics. Whole-genome mammalian resequencing at a high (>25X) read coverage, is still too costly for routine sequencing of thousands of samples e.g. in a case-control association study. Various DNA capture methods now offer an alternative solution for mass-scale resequencing of targeted gene regions.
Because of the swift evolution of sequencing technologies, and the rapid scale-up in data throughput software tools for next-generation sequence analysis are still in a state of flux. As next-generation human resequencing becomes more routine there is a growing need for efficient software and well-defined analysis pipelines. We developed a complete suite of software tools for mammalian-scale variation discovery. (1) Our read mapper/aligner program, MOSAIK, works with either single-end or paired fragment-end reads from 454, Illumina, AB, and Helicos machines. (2) Our Bayesian SNP / short-INDEL discovery program, GIGABAYES, now has algorithms for accurate individual genotype calling based on the aligned reads. (3) We developed a new program, SPANNER, for detecting structural variation events from paired-end read map positions, and quantifying copy number from the depth of read coverage. (4) Our alignment viewer program, EAGLEVIEW, aids visual data validation and hypothesis generation.
This pipeline was applied for SNP allele calling and SV/CNV detection in the multi-individual human genome resequencing data generated by the 1000 Genomes Project, including exon capture data collected for ~1,000 human genes. Our current work focuses on developing methods for highly accurate mutational profiling applications, and to tailor our tools for the analysis of expressed sequences in transcriptome sequencing projects.
December 16, 2008
Department of Biostatistics
Department of Environmental Health
“Delineation of Perturbed Biological Systems that Govern Hepatotoxic Potential”
Exposure to hepatotoxicants, either from the environment, idiosyncratic drug responses or toxic doses of a chemical agent, is a major concern to human health and puts the public at risk. Genomics has recently been used in an attempt to evaluate how environmental stressors affect cellular/tissue function and how changes in gene expression may relate to adverse effects. We used a compendium of microarray gene expression data, derived from exposure of rats to hepatotoxicants, to identify a subset of genes that are perturbed preferentially from the toxic insult. Using a variety of bioinformatics tools, computational algorithms and statistical methodologies we were able to discern key biological processes and molecular pathways that predict necrosis, and presumably govern the toxic responses in rat livers. In addition, we were able to glean a more informative biological interpretation of surrogate (blood) genomic indicators in rats that conferred hepatotoxicity and permitted extrapolation to humans presented with acetaminophen intoxication or who exhibited an adverse response to supratherapeutic amounts of the analgesic/antipyretic medication.
OCTOBER 21, 2008
"Using Functional Information in Published Text to Interpret and Predict Genome-Wide Association Results"
SEPTEMBER 15, 2008 - First Forum!
"Bioinformatics and Research at HSPH"
Department of Biostatistics, HSPH
"When is Bioinformatics more than just useful?"
Bioinformatics, HSPH and DFCI
"Integrative Approaches to Understanding Human Disease"
HSPH and DFCI
Bioinformatics, Department of Biostatistics, HSPH
"A Genomic View of Epigenetic Regulation"