Department of Biostatistics
Big Data Seminar

2015 - 2016

Organizers: Sheila Gaynor

Schedule: Mondays, 12:30-1:30 p.m.
FXB G13 (unless otherwise notified)

Contract All | Expand All
Seminar Description
This working seminar focuses on statistical and computational methods for analyzing big data. Big data arise from a wide range of studies in health science research, such as genetics and genomics, environmental health research, comparative effective research, electronic medical records, neuroscience, and social networks. We discuss recent developments in statistical and computational methodology for analyzing big data and health science applications where big data arise. The goal of this seminar is to exchange ideas and stimulate more quantitative research in this challenging and important area.

October 5 (Kresge G1 - Snyder Auditorium)

Hadley Wickham, Ph.D.
Assistant Professor of Statistics, Rice University, and Chief Scientist, RStudio

"Pure, Predictable, Pipeable: Creating Fluent Interfaces with R"
ABSTRACT: A fluent interface lets you easily express yourself in code. Over time a fluent interface retreats to your subconcious. You don't need to bring it to mind; the code just flows out of your fingers. I strive for this fluency in all the packages I write, and while I don't always succeed, I think I've learned some valuable lessons along the way.

In this talk, I'll discuss three guidelines that make it easier to develop fluent interfaces:
This talk will help you make best use of my recent packages, and teach you how to apply the same principles to make your own code easier to use.

October 19 (SPH2, Room 426)

Weihua An, Ph.D.
Assistant Professor, Departments of Sociology and Statistics, Indiana University - Bloomington

"Estimating ERGMs on Large Networks"
ABSTRACT: The exponential random graph model (ERGM) has become a standard statistical tool for modeling social networks. In particular, ERGM provides great flexibility to account for covariates effects on tie formation as well as endogenous network formation processes (e.g., reciprocity and transitivity). However, due to its reliance on Monte Carlo Markov Chains, it is difficult to estimate ERGMs on large networks (e.g., networks composed of hundreds of nodes and edges). This paper describes several methods to address the computational challenges in estimating ERGMs on large networks and compares their advantages and disadvantages. The paper also uses a school friendship network to demonstrate selected methods.

November 2

Michael Kosorok, Ph.D.
W. R. Kenan, Jr. Distinguished Professor and Chair, Department of Biostatistics, and Professor, Department of Statistics and Operations Research, University of North Carolina at Chapel Hill

"Precision Medicine and Machine Learning"
ABSTRACT: There has recently been an explosion of interest and activity in personalized medicine. However, the goal of personalized medicine—wherein treatments are targeted to take into account patient heterogeneity—has been a focus of medicine for centuries. Precision medicine, on the other hand, is a much more recent refinement which seeks to develop personalized medicine that is empirically based, scientifically rigorous, and reproducible. In this presentation, we describe several new machine learning developments which advance this quest through discovering individualized treatment rules based on patient-level features. Regression and classification are useful statistical tools for estimating such rules based on either observational data or data from a randomized trial, and machine learning approaches can help with this because of their ability to artfully handle high dimensional feature spaces with potentially complex interactions. For the multiple decision setting, reinforcement learning, which is similar to but different from regression, is necessary to properly account for delayed effects. There are several other intriguing nonstandard machine learning tools which can also greatly facilitate discovery of treatment rules. One of these is outcome weighted learning, or O-learning, which directly estimates the decision rules without requiring regression modeling and is thus robust to model misspecification. Several clinical examples illustrating these approaches will also be given.

November 16

Caroline Buckee, Ph.D.
Associate Director, Center for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health

"Using Mobile Phone Data to Estimate Human Migration and the Spread of Infectious Disease"
ABSTRACT: Until recently, estimating the dynamics of human populations was almost impossible at the scale of populations in low-income regions. However, the rapid adoption of mobile technologies, particularly among vulnerable and hard-to-reach populations, has led to enormous amounts of information on the location and mobility patterns of millions of individuals in the most remote parts of the world. We use this data to parameterize human mobility, and combine our estimates with infectious disease epidemiological models to predict how people spread diseases. I will discuss the application of this new approach to infectious disease epidemiology, as well as some of the limitations of the methods and future directions of the field.

November 30

Franziska Michor, Ph.D.
Professor of Computational Biology, Department of Biostatistics, Harvard T.H. Chan School of Public Health / Dana-Farber Cancer Institute

"Talk Title TBA"
ABSTRACT: None Given

December 14

Anthony Philippakis, M.D., Ph.D.
Cardiologist, Brigham and Women's Hospital
Research Scientist, Broad Institute of MIT and Harvard
Venture Partner, Google VenturesBroad Institute

"Talk Title TBA"
ABSTRACT: None Given

February 8

To Be Announced

"Talk Title TBD"
ABSTRACT: None Given

February 22

Roert Schapire, Ph.D.
Principal Researcher, Microsoft Research (NYC Lab)

"Talk Title TBD"
ABSTRACT: None Given

March 7

Ashish Jha, Ph.D.
K.T. Li Professor of International Health, Director, Harvard Global Health Institute, Department of Health Policy and Management, Harvard T.H. Chan School of Public Health

"Talk Title TBD"
ABSTRACT: None Given

March 21

Jianying Hu, Ph.D.
Distinguished Research Staff Member and Senior Manager, Health Informatics Research, Thomas J. Watson Research Center, IBM Research

"Talk Title TBD"
ABSTRACT: None Given

April 4

Zoran B. Djordjevic, Ph.D.
Continuing Education/Special Program Instructor, Harvard University / Senior Enterprise Architect, NTT Data, Inc.

"Talk Title TBD"
ABSTRACT: None Given

April 18

Sahand N. Negahban, Ph.D.
Assistant Professor, Statistics Department, Yale University

"Talk Title TBD"
ABSTRACT: None Given

May 2

To Be Announced

"Talk Title TBD"
ABSTRACT: None Given

Back to SPH Biostatistics Maintained by the Biostatistics Webmaster
Last Update: November 10, 2015