PQG Seminar Series
2009-2010
Fall seminars will be held from 12:00-1:30 PM in FXB G13 Spring seminars will be held from 12:30-2:00 PM in Kresge G2.
The Program in Quantitative Genomics at the Harvard School of Public Health will again host a regular monthly seminar series starting in the fall. The seminar series alternates on a bi-weekly basis with the Bioinformatics Forum.
The mission of the HSPH PQG is to improve health through an interdisciplinary study of genetics, behavior, environment and medicine through the development and application of quantitative methods, especially for high dimensional data, as well as the training of quantitative genomic scientists.
Upcoming Seminar
Tuesday, December 1, 2009
12:00-1:30, FXB G13
a pizza lunch will be provided
David Reich, Ph.D.
Associate Professor
Harvard Medical School, Department of Genetics
Associate Member, Broad Institute
"Reconstructing Indian Population History and Implications for South Asian Gene Discovery"
India has been underrepresented in genome-wide surveys of human variation. Here I describe an analysis of patterns of variation in 25 diverse Indian groups to provide strong evidence for two ancient populations, genetically divergent, that are ancestral to most groups today. One, the "Ancestral North Indians" (ANI), is genetically close to Middle Easterners, Central Asians, and Europeans, while the other, the "Ancestral South Indians" (ASI), is not close to any group outside the subcontinent. By introducing methods that can estimate ancestry without accurate ancestral populations, we show that ANI ancestry ranges from 39-71%, and is higher in traditionally upper caste and Indo-European speakers. Groups with only ASI ancestry may no longer exist in mainland India. However, the Andamanese are an ASI-related group without ANI-related ancestry, showing that the peopling of the islands must have occurred before ANI-ASI gene flow on the mainland. Allele frequency differences between groups in India are larger than in Europe, which we show reflects strong founder effects whose genetic signatures have been maintained for thousands of years due to endogamy. There are two key medical implications. First, our observations predict that there will be an excess of recessive diseases in India, different in each group, whose risk variants should be easy to identify using standard genetic methods. Second, the genetic risk factors that are only present in the ASI will be very difficult to discover without building specific genetic variation resources for South Asians.
Additional Seminar Dates
*** new room and time change in the spring***
February 2, 2010
12:30-2:00 PM, Kresge G2
Shamil Sunyaev, Brigham & Women's Hospital, Harvard Medical School
March 2, 2010
12:30-2:00 PM, Kresge G2
Hakon Hakonarson, Children's Hospital of PhiladelphiaApril 6, 2010
12:30-2:00 PM, Kresge G2
Nick Patterson - The Broad InstituteMay 4, 2010
12:30-2:00 PM, Kresge G2
Tianxi Cai - Harvard School of Public Health
Past Seminars
Tuesday, November 3, 2009
12:00-1:30, FXB G13
Giovanni Parmigiani,
Ph.D.
Chair, Department of Biostatistics & Computational Biology, DFCI
Professor, Department of Biostatistics, HSPH
"Cross-study Differential Gene Expression"
In this lecture I will present statistical issues associated with
combining microarray data across studies. I will focus on the role
of hierarchical Bayesian models in constructing useful rules to
shrink across both genes and studies, and to classify genes based
on the patterns of concordance across studies. I will describe in
detail a model we call XDE, and evaluate its performance in a comprehensive
fashion, using both artificial data, and a "split sample"
validation approach that provides an agnostic assessment of the
model's behavior not only under the null hypothesis but also under
realistic alternatives. Compared to a more direct combination of
t- or SAM-statistics, the 1 - AUC values for the Bayesian model
is roughly half of the corresponding values for the t- and SAM-statistics.
In small studies, XDE generally outperforms other methods when evaluated
by AUC, FDR, and MDR across a range of simulation parameters, and
this difference diminishes for larger sample sizes in the individual
studies. Finally, I will illustrate our model using four breast
cancer studies employing different technologies (cDNA and Affymetrix)
to estimate differential expression in estrogen receptor positive
tumors versus negative ones. Software and data for reproducing our
analysis are publicly available.
A technical report can be obtained from: http://www.bepress.com/jhubiostat/paper158/
Tuesday, October 6, 2009
12:00-1:30, FXB G13
Peter
Park, Ph.D.
Assistant Professor of Pediatrics HMS Center for Biomedical Informatics
Children's Hospital Informatics Program
"ChIP-sequencing: Data Analysis and Applications"
ChIP-seq combines chromatin immunoprecipitation (ChIP) with next-generation
sequencing to identify protein-DNA interactions on a genome-wide scale.
After a brief introduction to next-generation sequencing, a number of
practical issues in analysis of ChIP-seq data will be discussed, including
experimental design, detection of binding sites, and determination of
whether a sufficient depth of sequencing has been achieved. Application
of ChIP-seq to the study of X-chromosome dosage compensation in Drosophila
and nucleosome positioning will be described. If time allows, updates
from the Cancer Genome Atlas and the model organism ENCODE projects will
be given.
Tuesday, September 29, 2009
12:00-1:30, FXB G13
Mitchell
Gail, M.D., Ph.D.
Senior Investigator Biostatistics Branch Division of Cancer Epidemiology
and Genetics National Cancer Institute
"The value of adding single nucleotide polymorphism data to a model that predicts breast cancer risk"
Seven single nucleotide polymorphisms (SNPs) have recently been confirmed to be associated with breast cancer. I assessed the value of adding these SNPs to the Breast Cancer Risk Assessment Tool (BCRAT), which is based on ages at menarche and at first live birth, family history of breast cancer, and history of breast biopsy examinations. The model with these SNPs (BCRATplus7) had an area under the receiver operating characteristic curve (AUC) of 0.632, compared to 0.607 for BCRAT. This improvement is less than from adding mammographic density to BCRAT. I also assessed how much BCRATplus7 reduced expected losses in deciding whether a woman should take tamoxifen to prevent breast cancer and in deciding whether a woman should have a mammogram. In addition, I examined whether BCRATplus7 was more effective than BCRAT in allocating a scarce public health resource, such as access to mammography, based on ranking women on their breast cancer risk and allocating the resource to those at highest risk. In none of these applications did BCRATplus7 perform substantially better than BCRAT. A cross-classification of risk by the two models indicated that some women would change risk categories, depending on the risk threshold, if BCRATplus7 were used instead of BCRAT, but it is not known if BCRATplus7 is well calibrated. These results were hardly changed if three additional very recently identified SNPs were added. I conclude that the available SNPs do not improve the performance of models to estimate breast cancer risk enough to warrant their use outside the research setting.
