Department of Biostatistics
Big Data Seminar

2014 - 2015

Organizers: Marieke Kuijjer and Godwin Yung

Schedule: Mondays, 12:30-2:00 p.m.
HSPH2, Room 426 (unless otherwise notified)

Contract All | Expand All
Seminar Description
This working seminar focuses on statistical and computational methods for analyzing big data. Big data arise from a wide range of studies in health science research, such as genetics and genomics, environmental health research, comparative effective research, electronic medical records, neuroscience, and social networks. We discuss recent developments in statistical and computational methodology for analyzing big data and health science applications where big data arise. The goal of this seminar is to exchange ideas and stimulate more quantitative research in this challenging and important area.

October 6

John Quackenbush, Ph.D.
Professor of Computational Biology and Bioinformatics, Harvard School of Public Health / Dana-Farber Cancer Institute

"Taming the Big Data Dragon"
ABSTRACT: Nearly every major scientific revolution in history has been driven by one thing: data. Today, the availability of Big Data from a wide variety of sources is transforming health and biomedical research into an information science, where discovery is driven by our ability to effectively collect, manage, analyze, and interpret data. New technologies are providing abundance levels of thousands of proteins, population levels of thousands of microbial species, expression measures for tens of thousands of genes, information on patterns of genetic variation at millions of locations across the genome, and quantitative imaging data—all on the same biological sample. These omic data can be linked to vast quantities of clinical metadata, allowing us to search for complex patterns that correlate with meaningful health and medical endpoints. Environmental sampling and satellite data can be cross-referenced with health claims information and Internet searches to provide insights into the impact of atmospheric pollution on human health. Anonymized data from cell-phone records and text messages can be tied to health outcomes data, helping us explore disease transmission networks. Realizing the full potential of Big Data will require that we develop new analytical methods to address a number of fundamental issues and that we develop new ways of integrating, comparing, and synthesizing information to leverage the volume, variety, and velocity of Big Data. Using experiences derived from our work, I will present some examples that highlight the challenges and opportunities that present themselves in today's data rich environment.

October 20

Ray Liu, Ph.D.
Head of Analytical Innovation and Consultation group, Takeda Pharmaceutical Company, Japan

"Quantitative Considerations on Big Data for Pharmaceutical R&D"
ABSTRACT: With the advancement in technology platforms, researchers in pharmaceutical R & D have encountered data large in size and dimension in recent years. In addition, data are increasingly generated from novel sources. The bigger and more diverse data pose special challenges and opportunities for pharmaceutical statisticians. For example, how to extract information from un-structured text data appeared in medical records? How to integrate information from various data sources to generate novel knowledge and increase the chance of regulatory approval? How to identify association between clinical outcomes and genomic markers in high dimensional data with small sample size? In this talk we will share hands-on experience learned from handling data from such situations, focusing on applying quantitative thinking to impact pharmaceutical R&D.

November 3

Jennifer Listgarten, Ph.D.
Researcher, Microsoft Research New England

"Talk Title TBD"
ABSTRACT: None Given

November 17

To Be Announced

"Talk Title TBD"
ABSTRACT: None Given

December 1

Denis Agniel, Ph.D.
Research Associate in Biomedical Informatics, Center for Biomedical Informatics at Countway, Harvard Medical School

"Talk Title TBD"
ABSTRACT: None Given

December 15

Nicholas Horton, Sc.D.
Professor, Department of Mathematics and Statistics, Amherst College

"Building Precursors to the Analysis of Big Data: Guidelines for Undergraduate Programs in Statistics"
ABSTRACT: None Given

February 9 (Kresge G2)

Andreas Matern
Vice President, Disruptive Innovation at Thomson Reuters

"Talk Title TBD"
ABSTRACT: None Given

February 23 (Kresge G2)

JP Onnela, Ph.D.
Assistant Professor, Department of Biostatistics, Harvard School of Public Health

"Talk Title TBD"
ABSTRACT: None Given

March 9

Peter J. Park, Ph.D.
Associate Professor, Harvard Medical School Center for Biomedical Informatics

"Structural Alterations in Cancer Genomes (how we analyzed 100TB of data and lived to tell about it)"
ABSTRACT: It is now possible to generate whole-genome sequencing data for a patient at an affordable cost, and the amount of publicly available data continues to grow rapidly. I will give an overview of the computational methods we use to identify various structural alterations in cancer genomes. I will also describe the challenges associated with large-scale data management and analysis: at ~150GB raw data per patient, some of our analyses involve >100TB of raw data. I will also share my thoughts on the role of statisticians in these genomics projects.

March 30 (Kresge G2)

Paul McDonagh, Ph.D.
Director, Computational Biology, Biogen Idec

"Talk Title TBD"
ABSTRACT: None Given

April 13 (Kresge G2)

Michelle Girvan, Ph.D.
Associate Professor, Department of Physics and the Institute for Physical Science and Technology (IPST), University of Maryland

"Talk Title TBD"
ABSTRACT: None Given

April 27 (Kresge G2)

Miguel Hernán, M.D., MPH, Dr.P.H.
Professor of Epidemiology, Departments of Epidemiology and Biostatistics, Harvard School of Public Health

"Talk Title TBD"
ABSTRACT: None Given

May 11 (Kresge G2)

Jeffrey Leek, Ph.D.
Associate Professor, Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health

"Talk Title TBD"
ABSTRACT: None Given

Back to HSPH Biostatistics Maintained by the Biostatistics Webmaster
Last Update: October 24, 2014