Current Bioinformatics Training at HSPH
2008 Training and Lecture Series
Countway Library offers Bioinformatics Classes And Resources
Pathway analysis software is now also licensed at Countway.
HSPH has a floating license for GeneGo's Metacore software, which can be obtained from contacting Deanne Taylor. Additionally, users at HSPH can gain access to the Countway Library's resources for pathway analysis, new for 2008.
Countway has several sets of resources, new for 2008. Countway offers five GeneGo Metacore licenses, two Ingenuity IPA licenses and access to ExPlain from BioBase, all of which can be trained for through their classes offered at the link below. Anyone at HSPH is welcome to request an account on any of
these systems.
https://www.countway.harvard.edu/classes#bioinfo
Genome Wide Association Study (GWAS) Lecture Series
Hosted by the HSPH Program in Quantitative Genomics, Bioinformatics Core, and the Program in Molecular & Genetic Epidemiology
This lecture series provides recent developments in genome-wide association studies. The lectures are delivered on a weekly basis by a distinguished group of researchers between April 30 and May 30. Topics include copy number variation, platforms, analysis methods, germline and somatic GWAS. Everyone is welcome to attend. Abstracts and talk titles are forthcoming.
Speakers, Dates and Locations
Opening Lectures and Reception
April 30, 2008
Kresge G1, 1:00-3:00 PM
Wine Reception Following at 3:00 PM
- Dr. David Altshuler , Harvard Medical School, and Medical and Population Genetics Program, The Broad Institute
"Genomic Variation and the Inherited Basis of Common Disease"
- Dr. Goncalo Abecasis, Department of Biostatistics and Center for Statistical Genetics, The University of Michigan
"Adventures in Gene-Mapping: Design and Analysis of Genome Wide Association Scans."
Watch This Lecture!
May 2, 2008
Kresge G2, 3:30-5:30 PM
- Dr. Charles Lee, Department of Pathology, Brigham & Women’s Hospital and Harvard Medical School
"Copy Number Variation in the Human Genome" - Dr. Steven McCarroll, Medical and Population Genetics Program, The Broad Institute
"Inherited variation in human genome structure"
Watch This Lecture!
May 9, 2008
Kresge G1, 3:30-5:30 PM
- Dr. George Church, Department of Genetics and the Center for Computational Genetics, Harvard Medical School
"Sequencing for Coding/Regulatory Causative Allele Associations and Personal Genomics"
- Dr. David Christiani, Departments of Environmental Health and Epidemiology, Harvard School of Public Health and Department of Medicine, Harvard Medical School
"A Genome-Wide Analysis of Survival in Early-Stage Non-Small Cell Lung Cancer"
Watch This Lecture!
May 16, 2008
Kresge G1, 3:30-5:30 PM
- Dr. Mark Daly, Harvard Medical School, and Center for Human Genetic Research, Massachusetts General Hospital and the Broad Institute
"Understanding Crohn's Disease: Progress Through GWAS" - Dr. Hongyu Zhao, Division of Biostatistics, Yale School of Public Health
"Incorporating Prior Biological Knowledge in Genome-Wide Association Studies"
Watch This Lecture!
May 23, 2008
Kresge G2, 3:30-5:30 PM
- Dr. David Hunter, Department of Epidemiology, Harvard School of Public Health, and Channing Laboratory, Harvard Medical School
- Dr. Daniel Schaid, Department of Health Sciences Research, Mayo Clinic
"Genome-Wide Association Studies: Implications for Personalized Medicine and Family Disease Risks"
May 30, 2008
Kresge G1, 3:30-5:30 PM
- Dr. Stacey Gabriel, Genetic Analysis Platform and National Center for Genotyping and Analysis, The Broad Institute
"Approaches for Whole Genome Analysis of Human Genetic Variation" - Dr. Matthew Meyerson, Department of Pathology and Medical Oncology/Molecular and Cellular, The Dana-Farber Cancer Institute and Harvard Medical School
"Genomic Alterations of Human Cancer"
Abstracts
Genomic Variation and the Inherited Basis of Common Disease
David Altshuler MD, PhD
Associate Professor of Genetics and Medicine
Massachusetts General Hospital and Harvard Medical School
Director, Program in Medical and Population Genetics
Broad Institute of Harvard and Massachusetts Institute of Technology
Despite great progress in medical science, we have limited knowledge of the determinants of disease risk in vivo in the human population, and of targets for prevention and treatment. As family history is a strong and largely unexplained risk factor for human diseases, and given rapid development of methods for studying human genome variation, human genetics offers a promising approach to expand our knowledge of human diseases mechanisms.
Until recently, studies of human genetics were limited to family-based linkage studies, and to candidate gene association studies, neither of which have proven generally successful in studies of complex, common diseases. While linkage studies were genome-wide, and thus could discover novel information about disease mechanisms, association studies were previously limited to preconceived hypotheses about which genes might be responsible.
Our lab has worked to make possible systematic studies of common genetic variants for association to disease. We played leadership roles in the SNP Consortium and HapMap Projects, and in the development of affordable genome-wide genotyping technologies to enable whole-genome association studies in large clinical samples. We have developed statistical methods and standards to support robust and reproducible gene discovery from population-based association studies.
In the past year, whole genome association studies have uncovered novel and reproducible SNP associations for a wide variety of common diseases. We have contributed to gene discovery in type 2 diabetes, hyperlipidemia, prostate cancer, age related macular degeneration, rheumatoid arthritis, and systemic lupus erythematosis. We are pursuing these new leads to gain insight into disease mechanisms, and to develop new targets for therapeutic intervention and disease prevention.
Representative publications:
1. Diabetes Genetics Initiative of Broad, Lund and Novartis (2007) “Genome-Wide Association Analysis Identifies Loci for Type 2 Diabetes and Triglyceride Levels” Science 316:1331-6.
2. Haiman C, Patterson N, Freedman M, Myers S, Waliszewska A, Neubauer J, Tandon A, Schirmer C, McDonald G, Pike M, Stram D, Marchand L, Kolonel L, Frasco M, Wong D, Pooler LC, Ardlie K, Oakley-Girvan I, Whittemore A, Cooney K, John E, Ingles S, Altshuler D, Henderson E, Reich D. (2007) “Multiple regions within 8q24 independently affect risk for prostate cancer” Nature Genetics 39:638-44.
3. The International HapMap Consortium (2006) “A haplotype map of the human genome” Nature, 437:1299-320.
Adventures in Gene-Mapping: Design and Analysis of Genome Wide Association Scans
Goncalo Abecasis, PhD
Department of Biostatistics at the Center for Statistical Genetics
The University of Michigan
Enabled by rapid advances in genotyping technologies, genome-wide association studies have begun to identify susceptibility genes for complex traits and diseases. These studies involve the characterization and analysis of very high-resolution SNP genotype data for hundreds or thousands of individuals and pose several interesting statistical questions at the design, implementation and analysis stage. I will review some of the challenges we have encountered in executing our own scans, trying to pair important theoretical questions with practical examples. Many of my examples will focus on the challenge of imputing missing genotypes. Despite continuing improvements in SNP genotyping technologies, most genome-wide association studies only directly genotype a subset of all existing SNPs. I will review computationally efficient approaches for estimating unmeasured genotypes and evaluating the association between these unmeasured genotypes and relevant traits. These approaches all
rely on the intuition that even apparently unrelated individuals will share stretches of chromosome that include many SNPs. Once one of these stretches has been characterized in detail in a few individuals, the alleles it contains can be imputed in other carriers, with different degrees of accuracy. I illustrate the performance of the method and its potential utility using data from ongoing genome-wide association scans, including both scans that examine samples of related individuals and scans that examine samples ofapparently unrelated individuals. I also examine the performance of these approaches in different populations, using data from the Human Genome Diversity Panel.
Copy Number Variation in the Human Genome
Charles Lee, PhD
Director of Cytogenetics for the Harvard Cancer Center
Assistant Professor in Pathology at Harvard Medical School
Associate Faculty Member of the Broad Institute.
The major source of genetic variation in humans was long thought to be in the form of single nucleotide polymorphisms (SNPs) that account for 0.1% of the human genome. This led to the widely accepted notion that any two individual humans have DNA sequences that are 99.9% identical. However, studies over the past four years clearly demonstrate that the human genome also harbors thousands of structural variants in the form of DNA gains and losses (collectively termed copy number variants or CNVs) as well as balanced chromosomal rearrangements such as insertions, inversions, and translocations. Our group has been involved in the identification and characterization of human CNVs for accurate clinical genetic diagnostic testing as well as for disease association studies. Most recently, we constructed a CNV-enriched array platform and used this array to reveal that 1,020 of 2191 common human CNV loci were actually smaller than previously thought. In addition, approximately 8% of the CNV regions observed in multiple individuals exhibited profiles consistent with complex genomic architecture.
Studies of CNVs in different human populations and different model organisms are also important as they not only provide insights into genome evolution, but also provide a foundation for studying the functional significance of CNVs, especially for CNV loci that are directly relevant to human disease. Large-scale, human whole-genome sequencing efforts are also underway to provide a wealth of information in this regard.
Inherited Variation in Human Genome Structure
Steven McCarroll, PhD
Medical and Population Genetics Program, The Broad Institute
Testing the hypothesis that structural variation influences human phenotypes is a key challenge for human genetics in the coming years. The ability to assess copy-number variation in disease has been limited by the lack of techniques for accurately measuring the copy-number state of each CNV in each patient, and the lack of basic enabling knowledge about the locations and allele frequencies of most of the copy number polymorphisms (CNPs) that segregate in the human population. I'll describe work in which we developed a new generation of hybrid oligonucleotide microarrays (now widely available as the SNP 6.0 array) able to accurately analyze SNPs and CNVs simultaneously across the genome, then used the arrays to make a high-resolution map of the CNVs that segregate at an appreciable frequency in the human population. Our findings directly challenge current models in the field about the extent, size, and population-genetic properties of human copy-number variation, and about how relationships between CNVs and phenotypes can be discovered in genome- wide studies. I'll also describe a new analytical framework for whole-genome association studies for copy-number variation.
Sequencing for Coding/Regulatory Causative Allele Associations and Personal Genomics
George Church, PhD
Department of Genetics and the Center for Computational Genetics, Harvard Medical School
Association studies relying on linkage (e.g. LD & admixture) increased dramatically in 2007 but so far yield mainly weak predictive power and few causative alleles. Many researchers hope that this will be soon remedied by sequencing implicated regions only or repeating the studies with genome-wide sequencing. The Personal Genome Project (see Personalgenomes.org , Polonator.org) has been developing and integrating assays for coding variants (~1% of the genome), regulatory variants (RNA quantitation by sequencing) via personal stem cells, environmental factors (microbiome and VDJ-ome), and broad sets of medical and non-medical traits.
A Genome-Wide Analysis of Survival in Early-Stage Non-Small Cell Lung Cancer
David Christiani, PhD
David Christiani, PhD
Departments of Environmental Health and Epidemiology, Harvard School of Public Health and Department of Medicine, Harvard Medical School
Background: Lung cancer, of which 85% is non-small cell (NSCLC), is the leading cause of cancer-related death in the United States. We used genome-wide analysis of tumor tissue to investigate whether single nucleotide polymorphisms (SNPs) in tumor are prognostic factors in early stage NSCLC.
Methods: 100 early stage NSCLC cases from Massachusetts General Hospital (MGH) were used as a discovery set and 89 NSCLC cases collected by the National Institute of Occupational Health, Norway were used as a validation set. DNA was extracted from flash frozen lung tissue with at least 70% tumor cellularity. Genome-wide genotyping was done using the Affymetrix® 250K Nsp GeneChip®. Copy numbers were inferred using dChip software. Cox models were used to screen and to validate significant SNPs associated with the overall survival.
Findings: Copy number gains in chromosomes 3q, 5p and 8q were observed in both MGH and Norwegian cohorts. The top 50 SNPs associated with overall survival in the MGH cohort (p≤2.5×10-4) were selected and examined using the Norwegian cohort. Five of the top 50 SNPs were validated in the Norwegian cohort with false discovery rate less than 0·05 (p≤0·01) and all five were located in known genes. The numbers of risk alleles of the five SNPs showed a cumulative effect on overall survival (p for trend: 3·80×10-12 and 2·48×10-7 for MGH and Norwegian cohorts, respectively).
Interpretation: Five SNPs were identified that may be prognostic of overall survival in early stage NSCLC.
Collaborators and Co-authors:
Yen-Tsung Huang, Rebecca S. Heist, Lucian R. Chirieac, Xihong Lin, Vidar Skaug, Aage Haugen, Michael C. Wu, Zhaoxi Wang, Li Su, Kofi Asomaning
Department of Epidemiology (Y-T Huang MD MPH, Prof D C Christiani MD MPH), Department of Biostatistics (Prof X Lin PhD, M C Wu AM) and Department of Environmental Health (Y-T Huang MD MPH, R S Heist MD MPH, Z Wang MD PhD, L Su MSc, K Asomaning MD MPH, Prof D C Christiani MD MPH), Harvard School of Public Health, Boston, MA, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA (L R Chirieac MD); Cancer Center (R S Heist MD MPH) and Pulmonary and Critical Care Unit (Prof D C Christiani MD MPH), Massachusetts General Hospital, Boston, MA, USA; Section of Toxicology, Department of Biological and Chemical Working Environment, National Institute of Occupational Health, Oslo, Norway (V Skaug MD, A Haugen PhD).
Understanding Crohn's Disease: Progress Through GWAS
Mark Daly, PhD
Harvard Medical School
The Center for Human Genetic Research, Massachusetts General Hospital
The Broad Institute
Genomewide association studies have, for many common diseases and medical phenotypes, provided the first wave of validated genetic discoveries. Perhaps the clearest example of this has taken place in Crohn's disease where multiple studies have identified more than ten novel and replicated genetic associations, providing new insights into the true pathogenesis of disease. Here I will review progress leading to GWAS, highlight successful studies in Crohn's, describe a recent international meta-analysis identifying more than thirty genetic components of Crohn's risk, and detail some of the important functional insights that are emerging from this information.
Incorporating Prior Biological Knowledge in Genome-Wide Association Studies
Hongyu Zhao, PhD
Division of Biostatistics, Yale Public School of Health
The last three years have seen great successes in many Genome-Wide Association Studies (GWAS) which have identified numerous genetic variants underlying complex traits. The analysis and interpretation of data from
GWAS presents great statistical and computational challenges, especially after the initial discoveries of variants carrying relatively large effects. Although various statistical approaches have been or are being developed to
better analyze GWAS data, it has become apparent that the incorporation of information from prior studies and other sources is indispensable. In this presentation, we discuss our recently developed statistical methods and bioinformatics tools that are designed to more effectively integrate diverse types of prior biological information in analyzing GWAS data. The usefulness of these methods will be illustrated through their applications to some recent large scale GWAS data. This is joint work with David Ballard, Judy Cho, Valentin Dinu, Ji Young Lee, Iryna Lobach, Perry Miller, and Ning Sun.
Results from GWAS of Breast and Prostate Cancer
David Hunter, PhD
Department of Epidemiology, Harvard School of Public Health, and the Channing Laboratory, Brigham and Women's Hospital/Harvard Medical School
The cataloguing of human genes and determination of common genetic variation in the human genome presents the major challenge of determining how inherited genetic variation affects disease risk, assessing the proportion of specific diseases associated with particular genotypes, and how these genotypes interact with environmental and lifestyle factors in disease causation. The advent of the capacity to perform Genome-Wide Association Studies has led to a rapid series of findings relating common inherited variation with risk of common cancers and other diseases and phenotypes. These associations should lead to new mechanistic insights, as well as having the potential to offer individuals cancer risk assessment. GWAS performed on prostate cancer have led to the identification of common polymorphisms in the 8q24 region, HNF1B, MSMB, JAZF1 and several other genes and regions to risk of prostate cancer. Interestingly, some of these genes have also been implicated as diabetes susceptibility loci. In breast cancer, associations with common polymorphisms in FGFR2, and several other genes have been replicated. Translation of these findings into public health and clinical practice is complex, and made more complex by the sheer number of new findings. The new technologies that permit genome-wide assessment of common genetic variation in research studies, also permit the determination of these genotypes in individual consumers at low cost per genotype. The responsible incorporation of these new technologies into medical practice poses unprecedented challenges to our conventional models of evaluation of risk assessment tools in the population and the clinic.
Genome-Wide Association Studies: Implications for Personalized Medicine and Family Disease Risks
Daniel Schaid, PhD
Department of Health Sciences Research, Division of Biostatistics, The Mayo Clinic
Genome-wide association studies (GWAS) based on single-nucleotide polymorphisms (SNPs) have rapidly provided genetic clues for common diseases. Almost 100 loci for approximately 40 common diseases have been robustly identified and replicated over the past two years. Most GWAS have been powered to detect common alleles with genotype relative risks 1.2-1.5, so it is not surprising that this range of allelic effect size is commonly reported. Perhaps greater surprises are that few risk alleles involve previously suspected genes and many risk alleles are in regions without known genes. Furthermore, the number of replicable independent risk alleles for some diseases supports a polygenetic basis of common diseases. While some investigators have cautiously proceeded to use GWAS results to delve into finer mapping and resequencing, and to guide functional studies, others are developing panels of SNPs for commercial direct-to-consumer genomic tests (at least 27 companies are selling genetic tests through the internet). Beyond the obvious concerns of employment and health care discrimination, moving too early into SNP-screening for disease can potentially jeopardize personalized medicine, causing both the general public and health care providers to distrust genomic-based personalized medicine before it has achieved sufficient scrutiny and robustness. Although some simulation studies support the notion that genomic screening for polygenic risk alleles will advance personalized medicine, little information has been gleaned from GWAS. Using results from recent prostate cancer GWAS as a prototype, the polygenic basis of common disease will be explored in terms of the distribution of risk alleles in the general population and in families, and in terms of sensitivity-specificity ROC curves for disease screening. Prostate cancer is a good model, because of the large number of replicated GWAS, and prostate-specific antigen is a well-characterized biomarker for prostate cancer that serves as an ROC benchmark. Further issues surrounding the polygenic nature of disease will be discussed, such as modeling and selecting SNP risk alleles from GWAS. Finally, GWAS results will be used to reflect on the familial aggregation and genetic epidemiology of prostate cancer.
Approaches for Whole Genome Analysis of Human Genetic Variation
Stacey Gabriel, PhD
Genetic Analysis Platform and National Center for Genotyping and Analysis, The Broad Institute
Whole genome scanning, capturing the vast majority of common variation has become a routine practice. This is the result of dramatically advanced genotyping methods and the Hap Map resource. Current content and approaches to using these tools will be discussed. Beyond the current tools, which are well powered for only common variation, the use of high throughput sequencing to discover the full range of human genetic variation is also becoming more achievable. Large-scale human re-sequencing will supplement the current approaches and tools, expanding the power and scope of whole genome studies.
Genomic Alterations of Human Cancer
Matthew Meyerson, PhD
Department of Pathology and Medical Oncology/Molecular and Cellular, The Dana-Farber Cancer Institute and Harvard Medical School
Cancer is a disease of the genome. High-throughput genome analysis tools now enable the detection of somatic alterations in cancer cells including point mutations, copy number alterations, translocations, and infections.
To find copy number alterations, we have now analyzed over 2,600 cancer samples with arrays representing 250,000 mapped single nucleotide polymorphisms (SNPs), or most recently, over 1.8 million mapped probe sets. Major discoveries include the identification of lineage-specific amplification of the NKX2-1 gene in human lung adenocarcinoma (Weir et al., 2007).
To find mutations, we are performing systematic sequencing of selected gene families. Here, major discoveries include mutation of the epidermal growth factor receptor tyrosine kinase gene, EGFR, in human lung adenocarcinomas (Paez et al., 2004), and the fibroblast growth factor receptor 2 gene, FGFR2, in endometrial carcinomas (Dutt et al., 2008). EGFR and FGFR inhibitors respectively kill cells bearing these mutations.
To further systematic genome discovery in well-annotated cancers, the National Institutes of Health have established "The Cancer Genome Atlas" project. This consortium has now completed preliminary analyses of 206 glioblastoma tumor DNA and RNA specimens. Preliminary results of these studies will also be reported.
Dutt A et al. Drug-sensitive FGFR2 mutations in endometrial carcinoma. Proc Natl Acad Sci USA 2008: in press.
Paez et al. EGFR mutations in lung cancer: correlation with clinical response to gefitinib therapy. Science. 2004;304(5676):1497-500.
Weir BA et al. Characterizing the cancer genome in lung adenocarcinoma. Nature. 2007 Dec 6;450(7171):893-8.
