Professor of Biostatistics
Department of Biostatistics and Computational Biology Dana-Farber Cancer Institute
My research focuses on computational models of transcription and epigenetic regulation by integrating data from genome-wide ChIP-chip / ChIP-Seq, nucleosome occupancy and histone modifications, gene expression microarray / RNA-sequencing, genomic sequence and conservation. To this end, we focus on the following areas of studies:
First, we developed a number of algorithms widely used and cited for transcription factor motif discovery from promoters of gene expression clusters (BioProspector, Liu et al, PSB2001, cited by 419), ChIP-chip (MDscan, Liu et al, Nature Biotech 2002), gene expression perturbation experiments (Motif Regressor, Conlon et al, PNAS 2003), and using comparative genomics (CompareProspector, Liu et al, Genome Res 2004). These tools are also available as a web server (Liu et al, NAR 2004). We are currently developing a tool to find sequence motifs from ChIP-seq and nucleosome sequencing.
Second, we design computational algorithm for ChIP-chip and ChIP-seq data analysis. We published the 3rd ever ChIP-chip paper and the first to integrate ChIP-chip with transcription factor motif discovery (Lieb et al, Nat Genet 2001). We also published the 2nd ever genome-wide ChIP-chip study in human, which is the first to systematically study genome-wide enhancer binding function (Carroll et al, Nat Genet 2006). We developed widely used ChIP-chip (MAT, Johnson et al, PNAS 2006; MA2C, Song et al, Genome Biol 2007) and ChIP-seq analysis algorithms (MACS, Zhang et al, Genome Biol 2008), and downstream analysis pipeline (CEAS, Ji et al, NAR 2006). I also coordinated a study in the NIH ENCODE consortium (involving all the ChIP-chip pioneers) to systematically evaluate the effect of DNA amplification method, tiling array platforms and analysis algorithms on ChIP-chip results (Johnson et al, Genome Res 2008). I am currently organizing a similar study in the NIH mod/ENCODE consortiums comparing different algorithms and high throughput sequencing platforms.
My third area of interest is on epigenetic regulation. We published the 1st high throughput nucleosome positioning study in human (Oszolak et al, Nat Biotech 2007). We also designed algorithms to identify positioned nucleosomes from nucleosome-resolution histone modification ChIP-seq data (Zhang et al, BMC Genomics, 2008) and use nucleosome positioning and histone modification to predict miRNA promoters (Oszolak et al, Gene Dev, in press). I am currently working on finding differential nucleosome positions between different conditions, and comparing the intrinsic nucleosome positioning with in vivo positioning. I am also a co-PI for the worm chromatin group at the NIH modENCODE consortium.
Finally, we integrate the tools and data for ChIP-chip/seq to understandings to study trasncription and epigenetic regulatory network in cancer, aging, diabetes and stem cell differentiation. Through genome-wide ChIP-chip analysis, we discovered important transcriptional and epigenetic regulation mechanisms of estrogen receptor (ER) in breast cancer (Carroll et al, Cell 2005; Carroll et al, Nat Genet 2006), androgen receptor (AR) in prostate cancer (Wang et al, Mol Cell 2006), ER and AR’s collaborating transcription factor and chromatin remodeler FoxA1 in breast and prostate cancers (Lupien et al, Cell 2008), and peroxisome proliferator-activated receptor gamma (PPAR?) in adipose tissues (Lefterova et al, Gene Dev 2008). We are currently directing bench and computational experiments to identify the epigenetic signature and infer the transcription regulatory network of hormone independent breast and prostate cancers.
BA Biochemistry, BA Computer Science, 1997, Smith College
PhD Biomedical Informatics, PhD minor Computer Science, 2002, Stanford University