Tracking air pollution from power plants: Mapping regulations to populations

Faculty Mentor: Cory Zigler
Postdoc Mentor: Chanmin Kim
2017 Program Participants: David Angeles, Alexandra Carruthers Ferrero, Jovaniel Rodriguez Maldonado

Due to the well-established link between exposure to air pollution and human health, regulations to limit population exposure to particulate air pollution are estimated to account for over half of the monetized benefits (and nearly half of the costs) of all federal regulations. Many controversial and high-stakes regulations target harmful emissions from power plants, a major source of particulate pollution, but statistical and data-based methods to evaluate the effectiveness of these regulations are lacking. A main challenge of evaluating such regulations is the fact that air pollution moves through the atmosphere; intervening to reduce emissions at a power plant in Ohio can impact the air that people breathe in Boston. This project will combine time-varying data on emissions from over 1000 power plants in the US and measures of air pollution (and population health) at 10s of thousands of US zip codes to learn about the network defining how regulatory interventions at specific power plants spill across the country to impact the air people breathe. Such information can provide empirical support to regulatory decision making that has historically relied on non-statistical and non-data- based methods.

Smartphone-base Digital Phenotyping

Faculty Mentor: Jukka-Pekka Onnela
Postdoc Mentor: Patrick Staples
2017 Program Participants: Danielle Baldwin, Reibin Hiraldo, Silvio Martinez

Digital Phenotyping is the moment-by-moment quantification of the individual-level human phenotype in situ using data from personal digital devices such as smartphones. Beiwe is our platform for gathering, storing, and analyzing digital phenotyping data in a wide range of on-going studies, including subjects with depression, schizophrenia, ALS, eating disorders, PTSD, spinal surgery, and neurosurgery. While the scale and fine- grained detail of this data has enormous potential for prediction and classification, data quality can vary widely for different patients, and ground-truth validation data is difficult to come by. For this project, we will gather our own digital phenotyping data, as well as learn statistical and programming tools in Python to visualize data quality and find predictors of daily activities.

Machine Learning for Health Outcomes Prediction

Faculty Mentor: Sherri Rose
Graduate Student Mentor: Savannah Bergquist
2017 Program Participants: Alicia Dominguez, Julia Thome, Tyler Vu

The introduction of machine learning approaches for prediction in health research has the potential to provide improved insights. Historically, these questions have been addressed using parametric regression. Machine learning methods aim to smooth over the data, possibly making fewer assumptions than standard parametric regression techniques. Ensembling allows researchers to combine multiple algorithms to build an optimal prediction function. In this project, students will explore publicly available health data sets and implement machine learning and ensembling algorithms for prediction using existing R packages

Estimating and Understanding Gene Regulatory Networks

Faculty Mentor: John Quackenbush
Postdoc Mentor: John Platig
2017 Program Participants: Andrea Ovalle, Ula Widocki

Different physical states, or phenotypes, are often characterized based on differentially expressed genes. But gene expression in a cell is controlled through complex regulatory processes that involve regulatory genes (transcription factors, or TFs) activating or deactivating the expression of other genes. We developed an algorithm, PANDA, that models gene regulation as a communication process between TFs and their targets. By estimating networks separately for healthy and disease populations and comparing those networks, we can gain insight into the processes that drive changes from health to disease. Students will spend the first few days learning to program in R and will be introduced to gene regulatory networks and the pandaR package. They will then be given the opportunity to model regulatory networks in a disease model and explore their properties.

Controversy in Pharmacogenomics

Faculty Mentor: Rafa Irizarry
Graduate Mentor: Sheila Gaynor
2017 Program Participants: Jace Gilbert, Jeff Joseph, Daniel Meza

In 2012, two studies (Garnett et al and Barretina et al) attempted to correlate large numbers of gene expression, mutation, and copy number measurements in hundreds of cancer cell lines with sensitivities to hundreds of different drugs, with the goal of finding genes or mutations that might indicate certain kinds of cancers with vulnerabilities to specific drugs. However, a subsequent study (Haibe-Kains et al 2013), attempting to replicate the initial findings, found major inconsistencies in the results of the two studies. We will review the papers, download the data and analyze it ourselves to form our own conclusions.