Past Working Group Series


2013-2014



Working Group Co-organizers: Bjarni Vilhjalmsson & Sasha Gusev

For all local students, postdocs, and faculty - The PQG is continuing a less formal working group seminar to provide the opportunity to present and participate in the discussion of works in progress focused on the methods and analysis of high-dimensional data in genetics and genomics. This year we have a renewed focus on the work of young investigators. With the current enthusiastic speakers who have already volunteered, we look forward to an exciting seminar series for 2012-2013.

Tuesday, April 22, 2014
12:30-2:30, Building 2-Room 426

Kaitlin Samocha
Analytic and Translational Genetics Unit | MGH

Identification of a Set of Highly Constrained Genes from Exome Sequencing Data

A major challenge of medical genetics is to determine which variants, if any, contribute to disease in a patient. While variants can be prioritized based on their predicted deleteriousness, information about the disrupted gene can also be used to highlight those variants that are more likely to contribute to disease. For example, damaging variants in genes expressed in the relevant tissue might be prioritized over variants in genes that are not expressed in the tissue. Another potential way to prioritize variants is by the evolutionary constraint of the gene.

We developed a sequence context based model of de novo variation to create per-gene probabilities of synonymous, missense, and loss-of-function mutations. We noticed a high correlation (0.94) between the probability of a synonymous mutation in a gene and the number of rare, synonymous variants identified in that same gene using the NHLBI’s Exome Sequencing Project data (evs.gs.washington.edu). We predicted the number of variants that we would expect to see in the dataset and – in order to quantify deviations from those expected values – created a Z score of the chi-squared difference between the observed and expected variation. While the distribution of these Z scores for the synonymous variants was normal, there is a marked shift in the missense distribution towards having fewer variants than predicted.

We identified a list of excessively constrained genes representing roughly 5% of all genes. This set of genes identified as excessively constrained showed enrichment for entries in the Online Mendelian Inheritance in Man (OMIM) database and, in particular, for those with a dominant inheritance pattern. Using published data, we found that de novo loss-of-function variants identified in patients with autism and intellectual disability were in a constrained gene more often than expected (p < 0.0001 for both). This trend did not hold for those genes with a de novo loss-of-function variant in a control (p = 0.66), indicating that this approach can effectively prioritize genes in which mutations can strongly predispose to disease.


Tuesday, February 25, 2014
12:30-2:30, Building 2-Room 426

Stephan Ripke
Analytic and Translational Genetics Unit | MGH

Reality at Last: Psychiatric Genomics Consortium Quadruples Schizophrenia GWAS Sample Size

Genome wide association studies have been a principal study design in human genetics for almost ten years now. They have been performed on almost all known heritable human traits and diseases with the hope of shedding light on the biology of so called complex genetic diseases like diabetes mellitus, hypertension and hyperlipidemia. The psychiatric field, which has traditionally lacked biologic instruments for medical research was especially enthusiastic about this new technology. Still, early analyses did not bear fruit causing many to question or abandon this approach.

Genome wide association studies are now clearly one of the most successful genetic analysis methods in medical research. Results from these studies have led to many known and possibly novel drug targets and have deciphered many new biological pathways. Interestingly, psychiatric phenotypes are in no way inferior to somatic phenotypes. Yet the path to this success was more arduous than expected.

In this talk I will present lessons learned during the last ten years of GWAS studies. I will give an overview over this journey, highlighting early successes from somatic diseases (e.g. Crohn’s disease) and later but no less striking success from psychiatric diseases, schizophrenia in particular.

I will then attempt to translate these experiences into expectations and recommendations for future work in the field of genetics and in the field of psychiatric genetics specifically.

 


Tuesday, November 19, 2013
12:30-2:30, Building 2-Room 426

Gosia Trynka
The Raychaudhuri Lab | BWH | HMS

Analyses assessing enrichment of GWAS variants for non-coding annotations in the genome are upwardly biased


Interpreting the function of variants associated to complex traits is challenging, particularly because only a small proportion of the variants maps to the gene coding regions. The activity of the genomic regions can be inferred from whole genome assays for histone modifications, chromatin accessibility or specific transcription factors. Now such maps are becoming available for hundreds of cell-types and tissues.


A test for enrichment within chromatin annotations has become a widely applied approach to infer functions of trait-associated variants. However, currently there are no standards to correctly carry out such analysis. Typically, enrichment analysis takes associated variants and quantifies the overlap with chromatin annotations. The observed enrichment is then compared to the null distribution of SNPs sampled from the whole genome based on different parameters. Most frequently the null is defined by random sets of SNPs or matched for proximity to TSS and minor allele frequency. Different definitions of null distribution highly influence the significance of observed results.


We tested different sets of matching parameters and defined those that are essential to sufficiently control for type 1 error. With simulations we show that the currently employed matching methods are strongly biased towards false positive results. Additionally, we develop an independent method that does not rely on matching and instead takes into account local complex correlation of genomic annotations present at the associated loci.

 


Tuesday, November 5, 2013
12:30-2:30, Building 2-Room 426

David Golan
Ph.D candidate Rosset lab Tel-Aviv University

Accurate Estimation of Heritability in Case-Control GWAS


Linear mixed effects models have recently gained popularity as the method of choice for estimating heritability from GWAS data, i.e. quantifying how much of the variability of a phenotype can be explained by the genotyped SNPs.
However, most of the interesting diseases and disorders studied are rare (typically affecting <1% of the population), and so the proportion of cases in a study is usually considerably higher than the proportion of cases in the population. This over-representation of cases invalidates several key assumptions of linear mixed models, e.g. the normality and independence of the random effects. Ignoring these problems results in shrunken estimates of heritability. We propose an alternative approach for estimating heritability, related to the well known method of Haseman and Elston. We derive the relationship between the genetic similarity and the phenotypic similarity of any two individuals as a function of the heritability, while explicitly conditioning on the fact that both individuals were selected for the study. Our method then entials regressing the pairwise phenotypic similarities on the pairwise genetic similarities and using the slope to obtain an estimate of the heritability. We show, using simulations, that our method yields unbiased estimates which are considerably more accurate than the current state-of-the-art methodology. Applying our method to several well-studied GWAS yields heritability estimates which are considerably higher than previously published results.

We carry out extensive simulations under a wide spectrum of genetic models and probability distributions of the multivariate phenotype vector to evaluate the powers of our test procedures. We apply the proposed population-based method to analyze a multivariate phenotype comprising homocysteine levels, Vitamin B12 levels and affection status in a study on Coronary Artery Disease and the family-based method to analyze a vector of four endophenotypes associated with alcoholism: the maximum number of drinks in a 24 hour period, Beta 2 EEG Waves, externalizing symptoms and the COGA diagnosis trait in the Collaborative Study on the Genetics of Alcoholism (COGA) project.

 



Tuesd
ay, October 22, 2013
12:30-2:30, Building 2-Room 426

Saurabh Ghosh
Professor
Human Genetics Unit
Indian Statistical Institute, Kolkata

Integrating Multiple Phenotypes For Association Mapping


Most clinical end-point traits are governed by a set of quantitative and qualitative precursors and a single precursor is unlikely to explain the variation in the end-point trait completely. Thus, it may be a prudent strategy to analyze a multivariate phenotype vector possibly comprising both quantitative as well as qualitative precursors for association mapping of a clinical end-point trait. The major statistical challenge in the analyses of multivariate phenotypes lies in the modelling of the vector of phenotypes, particularly in the presence of both quantitative and binary traits in the multivariate phenotype vector.

For population-based data, we propose a novel Binomial regression approach that models the likelihood of the number of minor alleles at a SNP conditional on the vector of multivariate phenotype using a logistic link function. For family-based data comprising informative trios, we propose a logistic regression method that models the transmission probability of a marker allele from a heterozygous parent conditioned on the multivariate phenotype vector and the allele transmitted by the other parent. In both the approaches, the test for association is based on all the regression coefficients.

We carry out extensive simulations under a wide spectrum of genetic models and probability distributions of the multivariate phenotype vector to evaluate the powers of our test procedures. We apply the proposed population-based method to analyze a multivariate phenotype comprising homocysteine levels, Vitamin B12 levels and affection status in a study on Coronary Artery Disease and the family-based method to analyze a vector of four endophenotypes associated with alcoholism: the maximum number of drinks in a 24 hour period, Beta 2 EEG Waves, externalizing symptoms and the COGA diagnosis trait in the Collaborative Study on the Genetics of Alcoholism (COGA) project.

 



Tuesd
ay, September 24, 2013
12:30-2:30, Building 2-Room 426

Ron Do
Research Fellow in Genetics
Harvard Medical School

Human Genetics Approaches to Understand if Triglyceride-Rich Lipoproteins Cause Coronary Artery Disease


Plasma triglycerides are transported in specific lipoproteins; in observational epidemiologic studies, increased triglyceride levels correlate with higher risk for coronary artery disease (CAD). However, it is unclear whether this association reflects causal processes. Genetic information can help assess causality, and can be useful in dissecting the influences of correlated measures such as triglycerides, low-density lipoprotein cholesterol (LDL-C), and high-density lipoprotein cholesterol (HDL-C). Here, we present results from two studies that help us understand if triglyceride-rich lipoproteins cause coronary artery disease. In the first study, we used 185 common polymorphisms recently mapped for plasma lipid traits (P < 5 x 10-8 for each) to examine the role of triglycerides on risk for CAD. First, we highlight loci associated with both LDL-C and triglycerides, and show that the direction and magnitude of both are factors in determining risk for CAD. Second, we consider loci with a strong magnitude of association with triglycerides but a minimal one with LDL-C, and show that these loci are also associated with CAD. Finally, in a model accounting for effects on LDL-C and/or HDL-C, a polymorphism’s strength of effect on triglycerides is correlated with the magnitude of its effect on CAD risk. In the second study, we sequenced the exomes of 1,027 individuals with myocardial infarction (MI) at an early age, compared their sequences with 946 older individuals without MI, and validated initial associations using statistical imputation, genotyping, and re-sequencing. In follow-up sequencing of > 10,000 individuals, we observed rare non-synonymous mutations in the APOA5 gene were more frequent in early-onset MI patients versus MI-free controls at an exome-wide significance threshold (APOA5, 1.4% versus 0.6%, odds ratio = 2.2, P =5 x 10-7). Carriers had higher plasma triglyceride concentrations compared to non-carriers (median in carriers was 167 mg/dl versus 104 mg/dl for non-carriers, P=0.007). These results suggest that triglyceride-rich lipoproteins may causally influence risk for CAD and that novel therapeutic approaches targeted to TG-rich lipoproteins might be expected to reduce risk of CAD.

 


2012-2013

Working Group Organizer: Hugues Aschard
Postdoc Committee: Bjarni Vilhjalmsson, Jin Zhou, Megha Padi


Tuesday, April 2, 2013
12:30-2:30, Building 2, Room 426

Nikolaos A. Patsopoulos, MD, PhD
Instructor in Neurology
Brigham & Women's Hospital
Harvard Medical School

The genetics of common variation in Multiple Sclerosis

Multiple Sclerosis is a neurodegenerative disease with strong evidence of genetic predisposition. In this PQG I'll present current and on-going efforts of the International Multiple Sclerosis Genetics Consortium (IMSGC) to identify and fine-map genetic effects in large-scale data sets. The focus will be in design, addressing technical issues, e.g. complicated sample structure, and application of methods, e.g. control for population stratification.


Tuesday, March 5, 2013
12:30-2:30, Building 2, Room 426

Tune H. Pers
Research Fellow in Pediatrics
Children's Hospital Boston
Harvard Medical School

Predicting pathways and selecting likely causal genes from a height genome-wide association study by integrative network-based analysis

In genome-wide association (GWA) studies many loci have no single obviously causal gene, therefore the challenge for moving from association to novel biological insight is to identify which gene at each locus most likely explains the association. Another challenge is to identify whether associated loci coalesce onto biological pathways. A key first step is to use computational approaches to prioritize genes within loci as most likely to be biologically relevant, and to assess whether genes in associated loci enrich for particular gene sets. Our approach is based on expression patterns derived from a heterogenous panel of 80,000 expression arrays that predict functions for genes. The predicted functions include predicted protein-protein interactions, predicted phenotypic consequences, predicted Reactome pathway members, predicted KEGG pathway members, and predicted Gene Ontology term membership. We subsequently used these predicted functions to systematically identify the most likely causal gene(s) at a given locus. We evaluate our method using the GIANT consortium's GWA meta-analyses for human height. For human height GWA loci, our integrative approach performs consistently better in predicting likely causal genes than approaches based on one or two data types only. As unbiased benchmarks, we tested for enrichment of genes that recently have been shown to be differentially expressed in rodent growth plates, and for genes that have associated missense variants. Based on these and other benchmarks, this approach outperforms methods using fewer data types. We have developed an unbiased computational approach that integrates a variety of data types and performs well in prioritizing potentially causal genes from GWA data.

 


Tuesday, December 4, 2012
12:30-1:30, Building 2, Room 426

Paz Polak
The Broad Institute
BWH/HMS, Sunyaev Lab

Revealing the factors that shape the regional mutation rates in melanoma

Recent advances in high-throughput sequencing technology enable us to identify cancer-specific mutations at any position in a patient’s genome, allowing dozens of studies to reveal a rich world of mutation patterns. We have been studying how the density of mutations changes along the genome in different cancer types. We used epigenetic data from several normal cell types, recently released by the ENCODE project consortium, to predict the variation in cancer-specific mutation densities along human chromosomes. For prediction, we used three different methods: Random Forest; Poisson regression with elastic net regularization, which is a compromise between lasso and ridge regression; and Generalized Additive Models (GAM), which smooth the predictors and deal with the need to linearize their relations. Overall, we were able to explain up to 80% of the variation in melanoma genomes, showing that chromatin structure is involved in establishing mutation patterns in cancer.



Tuesday, November 27, 2012
12:30-1:30, Building 2, Room 426

Sara Lindstrom
Research Scientist
Department of Epidemiology

Deep targeted sequencing of 12 breast cancer loci in 4,700 women across four different ethnicities

Genome-wide association studies (GWAS) have identified multiple genetic loci associated with breast cancer risk. However, the underlying genetic structure in these regions is not fully understood and it is likely that the index GWAS signal originates from one or more yet unidentified causal variants within the region. We used next-generation sequencing to characterize 12 GWAS-discovered breast cancer loci in a total of 2,300 breast cancer cases and 2,300 controls across four ethnic populations. Region intervals spanned between 46kb and 973kb. In total we hybrid-captured and sequenced 5.5Mb. On average, we were able to capture 82% of the non-repetitive sequence in the targeted regions, and the average fraction of captured bases sequenced with a depth >20x was more than 95%. After single nucleotide variant (SNV) calling and quality control, a total of 138,898 SNVs remained for analysis. 87.4% of those had a minor allele frequency less than 0.5% and 50.2% were private mutations. Across all regions, a total of 81 SNVs showed evidence for association with breast cancer (P<0.001). However, we did not find unequivocal association evidence supporting a causal role for any individual SNV. These results illustrate the challenges facing post-GWAS fine-mapping studies.

 


Tuesday, October 2, 2012
12:30-2:00, Building 2, Room 426

Hugues Aschard
Postdoc of Peter Kraft

Exploring the effect of gene-gene and gene-environment interactions in breast-cancer risk prediction

Genome-wide association scans have identified scores of common genetic variants associated with the risk of complex diseases in the last years. However their aggregate effects on risk beyond traditional factors remain uncertain. Recent papers reported that the addition of information on these genetic variants to existing risk models for common diseases can moderately improve discrimination and prediction in estimating that risk. Here we explore the extent to which the consideration of interaction between genetic variants (GxG) and interaction between genetic variants and established clinical factors (GxE) can add to these risk-assessment models. Using data from the Nurses’ Health Study, we derive the predictive ability of 15 independent published risk alleles in 1145 breast cancer cases and 1142 matched controls. We evaluate the extent to which single nucleotide polymorphisms (SNPs) can discriminate breast cancer cases from controls in the whole sample and in strata defined by non-genetic risk factors. We conduct then a simulation study to explore the potential improvement of discrimination if complex GxG and GxE interactions exist and we know them.


 

2011-2012 Working Groups


Working Group Organizers: Elizabeth Schifano & Monica Ter-Minassian

For all local students, postdocs, and faculty - The PQG is continuing a less formal working group seminar to provide the opportunity to present and participate in the discussion of works in progress focused on the methods and analysis of high-dimensional data in genetics and genomics. This year we have a renewed focus on the work of young investigators. With the current enthusiastic speakers who have already volunteered, we look forward to an exciting seminar series for 2011-2012.



Tuesday, March 27, 2012
12:30-2:00, Building 2-Room 426


Aedin Culhane, Ph.D.
Research Scientist
Department of Biostatistic
Dana-Farber Cancer Institute

New biclustering approaches for pan-cancer analysis of a large compendium of genomics data

Though it is widely recognized that molecular traits such as p53 mutation and deregulation of the cell cycle span many types of cancer. Conventionally classification of cancer is by anatomical site and studies of cancer generally analyze the disease within these boundaries. There is a growing recognition that histological and molecular heterogeneity cross anatomical boundaries, for example clear cell ovarian cancer is closer to clear cell renal cancer than it is to other ovarian cancers. New pan-cancer analysis approaches are required to discover the molecular taxonomy of cancer upon which personalized cancer medicine can be built. I will describe a new biclustering approach that we have developed to enable analysis of large compendium of cancer genomics data and the pan-cancer molecular traits we have discovered.

 


 

Tuesday, February 28, 2012
12:30-2:00, Building 2-Room 426


Peggy Lai, M.D.
Post-doctoral research fellow in Environmental Health
(Mentors: David Christiani and Winston Hide)

Using a gene expression signature to understand endotoxin related chronic obstructive lung disease

A quarter of patients with chronic obstructive pulmonary disease (COPD) in the United States have never smoked. Endotoxin is a common and poorly recognized environmental exposure that may cause non-tobacco related COPD. In recent studies, chronic inhalational endotoxin has been associated with the development of obstructive lung disease in both murine models and human epidemiologic studies of occupationally exposed populations. Despite this, endotoxin related lung disease has not been well studied. Three distinct murine models of chronic endotoxin exposure have been developed in different laboratories with use of microarrays to characterize global gene expression. We identified a common 101 gene signature for recurrent endotoxin exposure that is both biologically interesting at the gene level, and at the pathway level demonstrated increasing numbers of inflammation related genes with longer periods of endotoxin exposure; this is surprising in light of the well-known phenomenon of endotoxin tolerance, suggesting that dysregulated inflammation may play a role in endotoxin-related lung disease. The focus of this talk is on the application of biostatical and bioinformatics methods to a clinically relevant but poorly understood disease.

 


Tuesday, January 31, 2012
12:30-2:00, Building 2-Room 426


Lin Li, Ph.D.
Postdoctoral Research Fellow
Department of Biostatistics
Harvard School of Public Health

Incorporating gene expression information in SNP set association testing

Increasing evidence suggests that single nucleotide polymorphisms (SNPs) associated with complex traits are likely to be expression quantitative trait loci (eQTLs). It is of interest to utilize eQTL information for understanding the genetic basis underlying complex traits. On the other hand, SNP set association testing, especially region-based, can harness information of SNPs in linkage disequilibrium and lead to increased power. We are interested in incorporating gene expression information in SNP set association testing. Specifically, we employ kernel machine tests using weighted kernels, where the predefined weights are based on eQTL signals. We also consider an adaptive test that is robustly powerful regardless of whether the expression signals are relevant or not to the trait of interest. As an application, we analyze an asthma study data using our proposed methods.

(Joint work with Prof. Xihong Lin and Prof. Liming Liang)

 



Tuesday, December 13, 2011
12:30-2:00, Building 2-Room 426


Chen-yu Liu
Research Fellow, Department of Environmental Health
Harvard School of Public Health

A pilot study of residential petrochemical exposures and DNA methylation changes for leukemia risk

We conducted a pilot study to assess the associations of genome-wide methylation pattern changes in peripheral blood and residential petrochemical exposure on 30 childhood acute lymphoblastic leukemia (ALL) cases and controls, as an exploratory extension of a population-based case-control study [1-3]. Higher concentrations of selected PAHs and VOCs in the vicinity of petrochemical industries in the study area have been reported, compared to those in industrialized communities of the U.S. [4, 5]. Hypermethylation /hypomethylation of specific genes involved in carcinogens metabolisms and exposure to several carcinogens of PAH and VOC may increase the childhood ALL risk. We used geographic information system tools to estimate individual-level exposure by accounting for subjects' mobility, length of stay at each residence, distance to petrochemical plant(s), and monthly prevailing wind direction. DNA methylation profiles were obtained by using the Illumina Infinium HumanMethylation27 BeadArray. Differential methylation levels between ALL cases and controls were found in genes involved in inflammatory response, lymphocyte differentiation/ activation, and apoptosis. Similar methylation pattern changes were observed when comparing petrochemical exposure correlated effects. Seventy prevent genes were overlapping compared the petrochemical exposure correlated changes and leukemia associated changes. The results do not remain significant after adjusting for multiple comparisons.

Reference:

  1. Liu, C.Y., et al., Maternal and offspring genetic variants of AKR1C3 and the risk of childhood leukemia. Carcinogenesis, 2008. 29(5): p. 984-90.
  2. Liu, C.Y., et al., Cured meat, vegetables, and bean-curd foods in relation to childhood acute leukemia risk: A population based case-control study. BMC Cancer, 2009. 9(1): p. 15.
  3. Yu, C.L., et al., Residential exposure to petrochemicals and the risk of leukemia: using geographic information system tools to estimate individual-level residential exposure. Am J Epidemiol, 2006. 164(3): p. 200-7.
  4. Lee, C., The characteristics of ambient volatile organic compounds and their carcinogenesis effects in the vicinity of one petrochemical industry in Taiwan. (In Chinese). Taipei, Taiwan, Republic of China: National Science Council. 1995.
  5. Lee, C. and G. Tsai, The characteristics of ambient polycyclic aromatic hydrocarbons and their carcinogenesis effects in the vicinity of one petrochemical industry in Taiwan. (In Chinese).Taipei, Taiwan, Republic of China: National Science Council,. 1994.

Tuesday, November 22, 2011
12:30-2:00, Building 2-Room 426
a pizza lunch will be provided

Cristian Tomasetti
Postdoc with Giovanni Parmigiani
Dana-Farber Cancer Institute and Harvard School of Public Health

Mathematical Modeling of Random Genetic Mutations for a General Class of Tumor Growth Curves with Applications

Various attempts to study random genetic mutations have been made
in the mathematical modeling literature. An important goal has been the
estimation of the probability for a speci c mutation to be present in a tumor
of a given size and the related estimate for the expected number of mutants,
if mutants are indeed present. Previous mathematical models addressing
these questions have shown however two potential limits: the tumor was
assumed to be homogeneous and, on average, growing exponentially. In this
talk a few recent results will be presented where the heterogeneity of the
tumor cell population is taken into account, and the exponential growth
of cancer has been replaced by other, arguably more realistic types of tu-
mor growth. Various application of this mathematical framework to chronic
myeloid leukemia and gastrointestinal stromal tumor will be introduced.


Tuesday, October 25, 2011
12:30-2:00, Building 2-Room 426
a pizza lunch will be provided

Noah Zaitlen
Postdoc with Alkes Price

Components of Heritability in An Icelandic Cohort

The combined genotypes and genealogy of 38,167 Icelanders provides the
opportunity to examine the contribution of different components of
genetic variation to the heritability of complex phenotypes. In this
work we focus on both parent-of-origin effects, and the contribution
of typed versus untyped variants. Genetic variation can alter
phenotype in a parent-of-origin specific manner, for example, via
imprinting. Kong et al [1] identified three Type 2 Diabetes (T2D)
variants including rs2334499, which is protective for T2D when
inherited maternally, confers risk when inherited paternally, and lies
in an imprinted region of the genome. The genealogy provides a means
of resolving not only highly accurate phasing, but also the
parent-of-origin of each typed variant. Using this information we
develop methods to examine the total contribution of parent-of origin
effects, as well as the differences between paternal and maternal
contributions to the heritability of complex phenotypes. We find that
variation of height shows little evidence of contribution from
parent-of-origin effects, while T2D shows highly significant evidence
(p-value < 1.9x10<sup>-5</sup>), suggesting that many more variants
such as rs2334499 remain to be found. In addition to parent-of-origin
effects, the unique long range phasing the deCode data provide a means
of efficiently estimating the fraction of the genome shared identical
by descent (IBD) as well as identical by state (IBS). As has been
recently demonstrated [2], IBS sharing can be used to estimate the
contribution to phenotype of genotyped SNPs and SNPs in linkage
disequilibrium (LD) to genotyped SNPs. IBD sharing estimates the total
narrow sense heritability of the phenotype, that is, the additive
contribution of all SNPs. Since both are readily available in these
data we are able to compare the IBD based estimates to the IBS based
estimates and show that for all phenotypes examined the majority of
heritability is well captured by the genotyped SNPs. We conclude with
a discussion of confounding effects in mixed model estimates of
heritability such as those provided by GCTA, showing that population
structure can lead to biased estimates of heritability even when
corrected for by principal component adjustments. 1 Kong et al.
Parental origin of sequence variants associated with complex diseases.
Nature 2009. 2 Yang el al. Genome partitioning of genetic variation
for complex traits using common SNPs. Nat Genet 2011.

 



Tuesday, September 27, 2011
12:30-2:00, Building 2-Room 426
a pizza lunch will be provided

Kimberly Glass

Postdoc with GC Yuan & John Quackenbush

Passing Messages Between Data Types to Refine Predicted Network Interactions

Gene regulatory network reconstruction is a fundamental question in computational biology. Although it is recognized that there are significant limitations when using individual datasets to reconstruct a regulatory network, it remains a major challenge to integrate multiple data sources effectively. We propose a message-passing model that utilizes multiple sources of regulatory information to predict regulatory relationships. A major advantage of our method is that the local connections are iteratively refined by exchanging information within each gene's functional neighborhood, thus increasing both numerical efficiency and biological accuracy. Using the yeast as a model system, we are able to demonstrate that this integrative method is able to more accurately reconstruct the regulatory network than other widely-accepted network reconstruction algorithms. In addition, our method is amenable to the future inclusion of data reflecting many different types of regulatory mechanisms, including protein complexes and protein-protein interactions, the binding of transcription factors to promoters, and epigenetic marks residing in the control regions of genes.

 

2010-2011 Working Groups


Tuesday, May 3, 2011
12:30-2:00, Building 2-Room 426


Sarah Fortune, MD
Assistant Professor of Immunology and Infectious Diseases
Department of Immunology and Infectious Diseases

Time for change: Using whole genome sequencing data to define the mutation rate of Mycobacterium tuberculosis during the course of infection

Mycobacterium tuberculosis (Mtb) poses a global health catastrophe that has been compounded by the emergence of highly drug resistant Mtb strains. In Mtb, all drug resistances are the result of chromosomal mutations and depend on the bacterium's capacity for mutation during the course of infection. We have used whole genome sequencing of bacterial isolates to derive estimates of the mutation rate of Mtb in the infected host in order to better understand the occurrence of drug resistance. Our data suggest that Mtb acquires mutations at roughly the same rate over time, irrespective of the organism's replicative state. However, the mutation spectrum in Mtb strains from animals with active and latent disease reflects different mutational pressures in different disease states. From these data, we are developing a mutational clock for Mtb in the infected host, which we are validating through WGS of human isolates from defined transmission chains.




Tuesday, April 5, 2011
12:30-2:00, Building 2-Room 426


Melissa Merritt, Ph.D.
Postdoctoral Research Fellow
Department of Epidemiology, Harvard School of Public Health
Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute

Cell-of-origin effect on tubo-ovarian tumor phenotype

The examination of gynecologic tissues with putative precursor lesions for ovarian tumors in clinical samples has identified a variety of suggested precursors for ovarian cancer. These observational studies have guided us to design experiments to address whether cells-of-origin influence the phenotype of tubo-ovarian cancer. To do this, we established paired normal human ovarian and fallopian tube epithelial cells in culture from two donors without cancer, and induced carcinogenic transformation of cells by the sequential introduction of hTERT, SV40 and H-Ras. The gene expression profiles of the normal and transformed cells were examined using HG-U133 Plus 2 microarrays (Affymetrix), which revealed a 'cell-of-origin' gene signature that distinguished ovarian (OV) vs fallopian tube (FT) epithelial cells from the same patient. Among the most significant genes we selected a subset that were over-expressed in either OV or FT origin cultured cells and used these to identify two subpopulations (OV-like versus FT-like) among human ovarian tumors in two publically available gene expression datasets. In this analysis FT-like tumors were predominantly composed of serous high grade cancers and were associated with significantly shorter disease-free survival. In contrast, OV-like tumors were associated with a better prognosis. We have previously demonstrated that cell-of-origin plays a role in determining the phenotype of breast tumors. These results provide further support for the hypothesis that cell-of-origin strongly influences tumor phenotype. We suggest that the most aggressive subtype of tubo-ovarian cancers may originate in the fallopian tube. The results from these studies may allow a cell-of-origin based improved molecular classification of ovarian cancers and a better understanding of the pathogenesis of mullerian tumors.


Tuesday, March 1, 2011
12:30-2:00, Building 2-Room 426


Monica Ter-Minassian, Sc.D.
Post doctoral Fellow at HSPH (Environmental Health Dept.) and DFCI (Medical Oncology Dept.)

Genetic variability in tobacco-specific nitrosamine NNK to NNAL metabolism

The nitrosamine NNK, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanone, is known to be one of the most potent tobacco carcinogens, particularly for lung adenocarcinoma. Recently, NNK urinary metabolites, total NNAL, 4-(methylnitrosamino)-1-(3-pyridyl)-1-butanol and its glucuronides, have been shown to be good predictors of lung cancer risk, years prior to diagnosis. We sought to determine if several genetic polymorphisms significantly contributed to the wide range of inter-individual variability observed in total NNAL output. The study subjects were derived from the Harvard/ Massachusetts General Hospital Lung cancer case-control study. We analyzed 87 self-described smokers (35 lung cancer cases and 52 controls), with urine samples collected at time of diagnosis (1992-1996). We tested 77 tagging SNPs in 16 genes related to the metabolism of NNK to total NNAL derived from prior GWAS genotyping on these subjects. Using a weighted case status least squares regression, we tested for the association of each SNP with square-root (sqrt) transformed total NNAL (pmol per mg creatinine), controlling for age, sex, sqrt packyears and sqrt nicotine (ng per mg creatinine). Three HSD11B1 SNPs and AKR1C4 rs7083869 were significantly associated with decreasing total NNAL levels. HSD11B1 and AKR1C4 enzymes are carbonyl reductases directly involved in the single step reduction of NNK to NNAL. The HSD11B1 SNPs may be correlated with the functional variant rs13306401 and the AKR1C4 SNP is correlated with the enzyme activity reducing variant rs17134592, L311V.

 

Nicola Segata, Ph.D.
Post Doctoral Fellow
Department of Biostatistics , HSPH


Metagenomic Biomarker Discovery and the Human Microbiome

Metagenomics has provided a new avenue for biomarker discovery, since changes in the composition and functional activity of microbial communities can provide insight into the ecological differences among communities or provide diagnostic or prognostic power when applied to the human microbiome. We propose the LDA Effect Size (LEfSe) algorithm to discover and explain microbial and functional biomarkers in the human microbiota and other microbiomes. LEfSe determines the features (organisms, clades, OTUs, genes, or functions) most likely to explain differences between classes by coupling standard tests for statistical significance with additional tests encoding biological consistency and biological relevance. We demonstrate this method to be effective in mining human microbiomes for metagenomic biomarkers associated with mucosal tissues and with different levels of aerobic metabolism. Similarly, when applied to 16S rRNA gene data describing a murine ulcerative colitis gut community, LEfSe confirms the key role played by Bifidobacterium and suggests the involvement of additional clades including Metascardovia. Finally, we provide characterizations of microbial functional activity from metagenomic community sequencing, comparing environmental bacterial and viral microbiomes. A comparison of LEfSe with existing microbial biomarker discovery methods and with standard statistical approaches (including an evaluation incorporating synthetic data) highlights a lower false discovery rate, consistent ranking of biomarkers relevance, and concise representations of taxonomic and functional shifts in microbial communities associated with environmental conditions or disease phenotypes.

 


 

Tuesday, February 1, 2011
12:30-2:00, Building 2-Room 426


Gabriel Altshuler
Post Doctoral Associate, HSPH

Pathway Fingerprinting; A Functional Framework for Multi-platform Integration

No objective, broadly applicable approach exists to compare molecular profiles, yet public repositories of gene expression arrays, RNASeq, GWAS, CAGE, epigenetic marks and ChIPseq data represent major resources for discovery. There is a pressing need for an objective, biologically-interpretable, functional abstraction of the comprehensive sample space of cellular genome, transcriptome and regulome data to compare these experiments within a consistent global framework. Pathway profiling significantly out-performs gene-based comparisons for cross-platform analysis by exploiting the fact that molecular profiles reflect activity of genes acting in concert across pathways. We have developed the ‘pathway fingerprint', a mapping of any molecular profile to a fixed set of known pathways. Each fingerprint generated is standardized relative to the full public data corpus so they are directly comparable to any other, ensuring broad applicability. The method has been successfully piloted for the analysis of expression data. We performed a pathway fingerprint meta-analysis to establish and verify the stem cell pluripotency pathway signature. This was used to classify cell types and successfully identify additional pluripotent arrays from the GEO expression database. Moving away from small-scale comparisons and towards a literature-wide study of pluripotency has offered a means to resolve an ongoing debate over the pluripotent potential of testis-derived stem cells. We are now in the process of expanding the utility of the pathway fingerprint to GWAS, RNASeq, and eipgenetic data to provide a unified method for functional integration within and between these experimental platforms.


Levi Waldron
Research Fellow, HSPH & DFCI

Whole-genome expression profiling of degraded tumor RNA from the NHS/HPFS colorectal cancer cohorts

Over 20 million formalin-fixed, paraffin-embedded (FFPE) tissue blocks per year are routinely stored for cancer patients in the United States, and in particular, are available for cancer patients from long-term epidemiological studies including the Nurses Health Study and Health Professionals Follow-up Study (NHS/HPFS). Traditional microarray technology requires high-quality RNA from fresh-frozen tissues, and are not appropriate for gene expression profiling of these tissues. The recently introduced cDNA-mediated Annealing, Selection, extension, and Ligation (DASL) microarray assay by Illumina enables whole-genome expression profiling from degraded, clinical FFPE tissues. We present a case study involving more than 1,000 patients from the NHS/HPFS colorectal cancer cohort, demonstrating an end-to-end pipeline for quality control, normalization, and application of the data to tumor subtyping and differential gene expression. We conclude that this technology differs in important ways from the traditional microarray, discuss some of the challenges and opportunities associated with its adoption, and present initial results for the subtyping of colorectal cancer.

 


Tuesday, December 7, 2010
12:30-2:00, Building 2-Room 426


Elaine Hoffman, Ph.D.
Research Scientist
Department of Biostatistics
Harvard School of Public Health


Applications of Path Analysis & Structural Equation Models in Environmental Health Studies

Path analysis and structural equation models will be presented in the context of environmental health. The Bangladesh-Arsenic project, Italian-Manganese project, and some of my other projects will be used as examples. The statistical application of path analysis and structural equation models and some of the obstacles I have encountered also will be discussed. Path analysis is a statistical model that can account for complex relationships among variables and correlated variables. It is often used when there are suspected causal relationships. Structural equation models are a class of covariance structure models that simultaneously model multiple surrogates of both exposure and outcome. Both path and structural equation models are often shown as path diagrams. These models have been extensively used in the social sciences, and more recently are beginning to be used in environmental epidemiology. This presentation is a Work-in-Progress, with many of the statistical ideas discussed not having been published or completed yet. Software packages will be discussed.




Tuesday, November 2, 2010
12:30-2:00, Building 2-Room 426


Benjamin Haibe-Kains, Ph.D.
Research Fellow
Department of Biostatistics & Computational Biology
Dana-Farber Cancer Institute


Breast Cancer Molecular Subtypes: A Three-gene Model for a Translation Into Clinic

Background: Gene expression studies have well established that breast cancer (BC), in addition to being clinically diverse, is also a molecular heterogeneous disease. The early studies classified BC into at least three clinically relevant molecular subtypes: basal-like, HER2-enriched, and luminal tumors, with each subtype exhibiting different prognosis and response to therapies. Demonstration of the molecular heterogeneity within BC has changed the way clinicians perceive the disease and has a dramatic impact on the design of new clinical trials.

During the last decade, several methods have been proposed to classify breast cancer into their corresponding molecular subtype using gene expression. Three versions of a "Single Sample Predicton" (SSP), based on a hierarchical clustering and a nearest centroid classifier using different sets of "intrinsic" genes, have been proposed; however this method of classification has been shown to be highly unreliable by many investigators. We have previously reported a "Subtype Clustering Model" (SCM), which uses a mixture of three Gaussians with ER, HER2 and proliferation-related gene modules, to estimate probabilities of belonging to each BC molecular subtype. This model has improved classification stability; however such a method is not so easily clinically implemented.

In this work we developed a novel SCM-based 3-gene classifier for molecular subtyping and we evaluated its concordance, robustness and prognostic value compared with five existing classifiers.

Materials and Methods: We refined the SCM to its simplest form, a classification model that uses only the 3 genes reported to be the main discriminators of the molecular subtypes: ER, HER2 and AURKA (proliferation). We evaluated the concordance and robustness of five previously described classifiers- three SSPs and two SCMs- and the new 3-gene SCM, using gene expression and clinical data from a large compendium of publicly available BC datasets comprising 5,113 primary breast tumors. Clinical relevance was determined from survival analysis of a subset consisting of 1,318 untreated node-negative patients.

Results: SCM-based classifiers, including our 3-gene model, were significantly more robust than all the SSPs (prediction strength > 0.8, p < 0.001); notably, the 3-gene SCM was the most robust classifier for molecular subtypes. Although all models were concordant (Cramer's V = 0.54-0.81, p < 0.001), with basal-like subtype being particularly well defined (median Cramer's V of 0.8), SCMs yielded stronger concordance than SSP models. Overall, SCMs were also more consistent with traditional clinical variables (ER, HER2 status by IHC/FISH and histological grade). All classifications yielded significant and independent prognostic value.

Conclusions: We found significant disparities in robustness of BC molecular subtype classification models. SCMs outperformed SSPs and consistently identified molecular subtypes in numerous datasets derived using various microarray technologies and conducted by different laboratories. Compared with existing models, we propose that a 3-gene SCM-based model is the most reliable and its simplicity could be a significant step towards translation of BC molecular subtyping into the clinic.

Keywords: molecular subtypes, gene expression, clustering, classification, robustness, concordance, prognosis


 

Tuesday, October 5, 2010
12:30-2:00, Building 2-Room 426


Pinaki Sarder, Ph.D
Research Fellow
Department of Biostatistics
Harvard School of Public Health

Functional understanding of microbial communities using experimental data integration

To understand the functional and metabolic activities of microbes and microbial communities, it is critical to link genes and proteins to their biological roles. This encompasses both their biochemical activities and the processes and pathways in which they are used by the cell. This problem is typically approached by transferring knowledge to newly sequenced genomes by relying on sequence similarity. This can be a difficult process involving sparse knowledge and the propagation of error, and even the best-studied organisms' genomes are only partially characterized. To mitigate these issues, we have developed a data integration method TafTan leveraging all experimental results available from multiple model systems to identify potential functional roles for genes in a new organism. The performance of TaFTan's genome-wide functional network prediction was evaluated using ~300 experimental datasets from 20 model organisms. This evaluation study demonstrated that TaFTan is able to significantly improve individual organisms' inferred functional networks by transferring knowledge from other experimentally characterized systems.



September 21, 2010
12:30-2:00, Building 2-Room 426 (Biostats Conference Room)

X. Shirley Liu, Ph.D.
Associate Professor of Biostatistics
Departments of Biostatistics and Computational Biology
Harvard School of Public Health & Dana-Farber Cancer Institute

Computational Genomics of Gene Regulation

High throughput genomics technologies brought a paradigm shift to gene regulation studies, but they also created challenges on data analysis. In this talk, I will highlight two studies conducted in my lab to show how computational and statistical algorithms could help remove the noise in the data, provide informative results, and help design efficient experiments. One study is a model-based analysis of tiling arrays for ChIP-chip peak calling, and the other is using dynamics of H3K4me2 nucleosomes to infer the in vivo transcription factors and their binding sites driving a biological process.

 

2009-2010 Working Groups



Tuesday, April 27, 2010
12:30-2:00 PM
KRESGE G2

Robert Wright, M.D., M.P.H.
Associate Professor
Department of Environmental Health, HSPH
Department of Pediatrics, HMS

A Framework for Measuring Gene-Environment Interactions in Children

Will discuss issues of development unique to children that will improve estimates of gene environment interaction including child-specific pitfalls to case control and family-based association designs.

 



Tuesday, March 23, 2010
12:30-2:00 PM
Kresge 201

Marianne Wessling-Resnick, Ph.D.
Professor of Nutritional Biochemistry
Dept of Genetics and Complex Diseases
Director of the PhD Program in Biological Sciences in Public Health
Harvard School of Public Health

Chemical Genetics of Iron Transport

Chemical genetics is an emerging field that takes advantage of combinatorial chemical and small molecule libraries to dissect complex biological processes. Small molecules can act very fast, can be very specific, and can help to distinguish the temporal order of molecular steps and the hierarchical regulation of biological processes. Because small molecules can alter the function of a specific gene product, they can be used in a manner analogous to the use of inducible dominant or homozygous recessive genetic mutations. A large body of biochemical literature is based on the past use of small molecule antagonists that were employed in "reverse chemical genetics" approaches to conditionally eliminate protein function, and on that basis to subsequently identify the target, its mechanism of action, and its regulation. Thus, ouabain helped to define the catalytic cycle of the NaK-ATPase, cytochalasin B was instrumental in defining the molecular basis for insulin's action to stimulate glucose uptake, and analogs of amiloride were used to purify and define the epithelial Na channel. There is a need to develop "forward chemical genetics" in order to discover small molecules that partner with key elements in a pathway of interest. Our goals are to discover small molecule inhibitors of iron transport using chemical genetics and to use these reagents to advance our understanding of the factors, mechanisms, and regulation of different pathways of iron metabolism.


Patrick Loerch
Research Fellow, Computational Biology
Merck

Using Networks to Integrate Diverse -omics Datasets and Identify Disease Pathways

The dramatic increase in the application of omics-based platforms (microarrays, GWAS, proteomics, deep sequencing, metabolomics, etc) in biological research has resulted in the generation of vast public and
private data repositories spanning a wide array of diseases, tissues and species. The challenge facing researchers today, commonly referred to as integrative genomics, is figuring out how to integrate, analyze and interpret all of this information within the context of a well-defined biological question. One biological question of particular importance is the identification of genes/proteins that contribute to, and/or are altered as a result of, the onset of a specific disease. As opposed to the gene-centric approach to integrative genomics, which looks for genes that are associated with a disease across multiple omics platforms, we have developed a pathway-centric approach. We hypothesize that the onset of disease involves the altered regulation of specific pathways, which can be triggered by any number of genes or proteins, so long as the end result is the same. Working under this hypothesis, we have developed a network-based approach to integrating various omics datasets with the aim of identifying disease pathways. This approach also allows us to take into account a number of biological realities, such as the regulation of pathway members at various states (from transcription to translation) and the fact that the regulation of some pathway members will simply not be observable through omics technologies. Here we will present the development of this methodology within the context of identifying disease pathways, discuss a specific application/validation of the method, and describe ongoing efforts to further refine this approach.




Tuesday, February 23, 2010

12:30-2:00 PM
Kresge 201

Curtis Huttenhower, Ph.D
Assistant Professor of Computational Biology and Bioinformatics
Department of Biostatistics
Harvard School of Public Health

Scalable Data Mining for Functional Genomics and Metagenomics

The average human body contains over ten times as many microbial cells as "human" cells. These microbial communities are usually beneficial, but their dysfunction has been linked to conditions ranging from obesity to antibiotic resistant infections. The recent dramatic reduction in the cost of DNA sequencing has opened up several exciting new ways in which we can explore how the human microflora vary across populations and how they can be manipulated to improve human health.

Biological network integration and mining algorithms provide a means of assembling the entire body of cultured microbial genomic data, understanding it from a systems level, and applying it to the study of uncharacterized species and communities. We compare unsupervised and supervised Bayesian approaches to biological network integration; this process provides maps of functional activity and genomewide interactomes in over 100 areas of cellular biology, using information from ~5,000 genome-scale experiments pertaining to 13 microbial species. In combination with graph alignment, these network manipulation tools provide a means for analyzing the functional activity unique to particular pathogens, transferring putative functional annotations to uncharacterized organisms, and potentially inferring interactomes using weighted network integration for metagenomic communities.

 

Ed Silverman, M.D., Ph.D.
Associate Professor of Medicine
Harvard Medical School
Associate Physician, Brigham and Women's Hospital

Genetic Epidemiology of Chronic Obstructive Pulmonary Disease

Genetic factors are likely important determinants of chronic obstructive pulmonary disease (COPD) susceptibility. Genome-wide association analysis has recently been performed in COPD, and several regions of highly significant association on chromosomes 15 and 4 have been identified. Because COPD is a heterogeneous disease, genetic studies have focused on the identification of distinct subgroups of COPD subjects (COPD subtypes) as well as disease-related conditions (COPD-related phenotypes). Genetic association studies combined with chest CT scan analysis have the potential to lead to substantial new insights into COPD pathogenesis, which could provide important pathways to develop new treatments for COPD.

 


Tuesday, January 26, 2010
12:30-2:00 PM
Kresge 201

John Quackenbush
Professor of Computational Biology and Bioinformatics
HSPH & DFCI

Network and State Space Models: Science and Science Fiction Approaches to Cell Fate Predictions

Two trends are driving innovation and discovery in biological sciences: technologies that allow holistic surveys of genes, proteins, and metabolites and a realization that biological processes are driven by complex networks of interacting biological molecules. However, there is a gap between the gene lists emerging from genome sequencing projects and the network diagrams that are essential if we are to understand the link between genotype and phenotype. ‘Omic technologies such as DNA microarrays were once heralded as providing a window into those networks, but so far their success has been limited, in large part because the high-dimensional they produce cannot be fully constrained by the limited number of measurements and in part because the data themselves represent only a small part of the complete story. To circumvent these limitations, we have developed methods that combine ‘omic data with other sources of information in an effort to leverage, more completely, the compendium of information that we have been able to amass. Here we will present a number of approaches we have developed, including an integrated database that collects clinical, research, and public domain data and synthesizes it to drive discovery and an application of seeded Bayesian Network analysis applied to gene expression data that deduces predictive models of network response. Looking forward, we will examine more abstract state-space models that may have potential to lead us to a more general predictive, theoretical biology.

Miguel Camargo
Merck Reserach Laboratories

"Pathway based analysis of whole genome siRNA screens and de novo pathway identification"

A prevailing model of Alzheimer's Disease etiology is the progressive aggregation of toxic Ab (Abeta) peptides in the brain. Ab is produced as a result of proteolytic processing of the amyloid precursor protein (APP). Cleavage of APP by beta- and gamma-secretases result in the production of Ab and hence several drug discovery efforts are aimed at
finding either beta- or gamma-secretase inhibitors. However, development of small molecules to either of these enzymes has proven to be challenging. In order to overcome limitations of developing b or g-secretase inhibitors, we performed large scale siRNA screens in order to identify novel regulators of APP processing that could represent more tractable drug targets. The screen measures the production of four APP proteolytic products: the non-amyloidgenic peptide (sAPPa) or the amyloidgenic pathway peptides (Ab40, Ab42, and sAPPa). We introduce a
novel analysis method that scores the overall effect of individual pathways on the processing of the APP protein. This method takes into account all genes in the pathway, thus allowing for small effects to be considered, and introduces the concept of scoring 'pathways' as opposed to individual genes as a way of mitigating against false positive hits.
Using this method, we identified novel and distinct pathways that regulate processing of APP into either amyloidgenic peptides or non-amyloidgeneic peptides respectively. We will also highlight how to leverage biological network data in combination with siRNA screens to identify novel pathways.


Tuesday, December 15, 2009
12:30-2:00 PM
Biostatistics Conference Room (Building 2, Room 426)

Oliver Hofmann, Ph.D.
Research Associate in Biostatistics

Combining curated and in silico interaction data in network analysis

Integrating biological data obtained from multiple high throughput platforms is an area of active research. While it is possible to normalize for technical differences, laboratory effects and other artifacts the problem of merging data from different biological samples is still mostly unsolved. Standard methods include rank-based analysis of biological features (mRNA abundance, peptide counts etc.) rather than absolute measurements as well as higher levels of abstraction such as Gene Set Enrichment Analysis or the identification of Metagenes. By comparing biological systems at the network level an additional layer of data abstraction is added, allowing for the identificaton of connected network areas contributing to a phenotype of interest in heterogenous samples or between studies. Three different examples highlight current limitations of data availability and quality, identifying possible future methods for even higher levels of data abstraction to analyze complex systems.

Stalo Karageorgi, M.S.
Doctoral Student in Environmental Health

Polymorphisms in HSD17b2 and HSD17b4 and endometrial cancer risk

Hydroxysteroid dehydogenase 17b (HSD17b) genes encode for enzymes that control the last step in estrogen biosynthesis. The isoenzymes HSD17b2 and HSD17b4 in the uterus preferentially catalyze the conversion of estradiol, the most potent and active form of estrogen, to estrone, the inactive form of estrogen. Endometrial carcinoma is a disease strongly linked to the imbalance between the hormones estrogen and progesterone. We hypothesized that variation in single nucleotide polymorphisms (SNPs) in genes HSD17b2 and HSD17b4 may alter the enzyme activity, estradiol levels and risk of disease. Pairwise tagging SNPs were selected from the HapMap CEU database to capture all known common (MAF >0.05) genetic variation in the gene region with an r2 of at least 0.8. SNPs were genotyped in participants in the nested case-control studies in the Nurses' Health Study (NHS) (cases=544, controls=1296) and the Womens' Health Study (WHS) (cases=130, controls=389), who provided a blood or cheek cell sample. The association between SNPs and endometrial cancer was examined using conditional logistic regression to estimate odds ratio and 95% confidence intervals adjusted for known risk factors. We additionally investigated whether SNPs are predictive of plasma estradiol and estrone levels in the NHS using linear regression. This is the first study to report on genetic variation in HSD17b2 and HSD17b4 in relation to endometrial cancer.


Tuesday, November 17, 2009
12:30-2:00 PM
Biostatistics Conference Room (Building 2, Room 426)

Zhaoxi (Michael) Wang, M.D., Ph.D.
Research Scientist, Environmental & Occupational Medicine and Epidemiology Program

"Mitochondrial Variations in NSCLC by Microarray-based Resequencing"

Mutations in human mitochondrial genome (mtDNA) genome have long been suspected to play an important role in the development of cancer. Although most cancer cells harbor mtDNA mutations, the question of whether such mutations are associated with clinical prognosis of cancer remains unclear. In this study, we resequenced the entire mitochondrial genomes of tumor tissue from a population of 249 Korean non-small cell lung cancer (NSCLC) patients using the Affymetrix GeneChips Human Mitochondrial Resequencing Array 2.0 (Santa Clara, CA). In early stage (stage I/ II) NSCLC, patients with the haplogroup D4 had the worst clinical prognosis. Interestingly, haplogroup D4 was previous reported as a marker for extreme longevity in Japanese.


Alkes Price
Assistant Professor of Statistical Genetics, HSPH

"Effects of Cis and Trans Family Heritability on Single-tissue and Cross-tissue Gene Expression Regulation"

Family heritability is a useful approach for understanding the genetic basis of gene expression. Heritability analyses can evaluate the contribution of both cis and trans regulation by considering genetic relatedness either genome-wide (trans), or at the genomic location proximal to the expressed gene (cis). We used gene expression data from blood and adipose tissue cohorts to estimate the contribution of cis and trans regulation to heritable variation in gene expression. We estimate that cis regulation contributes 45±7% of heritability in blood expression and 30±3% of heritability in adipose tissue expression, with the difference entirely attributable to greater trans effects in adipose tissue. We also conducted a cross-tissue analysis to investigate regulation that is shared across tissues. Strikingly, we observed that cross-tissue regulation is dominated by cis effects. These analyses point to a greater contribution for cis regulation than previous admixture-based analyses. This divergence would be consistent with a substantial role for epigenetic regulation, whose effects are included in heritability analyses but excluded in admixture analyses. Our results have implications for understanding the causes of "missing heritability" in genetic association studies.

 


Tuesday, September 22, 2009 1 2:30-2:00 PM
Building 2, Room 426 - Biostatistics Conference Room

Meet & Greet, Monica Ter-Minassian practice talk on Genetic Risk Factors of Neuroendocrine Tumor

 

Please feel free to contact us with any comments or questions at: sandelma@hsph.harvard.edu