current fellows (2017-2018)
Academic advisor: Francesca Dominici
Beau is a first-year student in the Biostatistics PhD program. His current research interest is machine learning methods in biological or healthcare contexts. The majority of his semester has been spent focusing on coursework. Larry plans to take the following courses this academic year: HPM548: Responsible Conduct of Research, EPI201: Introduction to Epidemiology: Methods I COMPSCI181: Machine Learning, COMPSCI282R: Deep Bayesian Models, STAT221: Monte Carlo Methods for Statistical Learning, STAT234: Sequential Decision Making, CS287: Machine Learning for Natural Language Processing, and COMPSCI205: Computing Foundations for Computational Science, and BST234: Introduction to Data Structures and Algorithms. In addition to his coursework he has been working on a project that compares Bayesian neural networks to Gaussian processes, specifically with respect to their length scale and stationarity. This was a class project from the Fall semester that he will continue to work on in the January winter session. Beau is interested in extending this project into his BD computing rotation with Dr. Finale Doshi-Velez. Beau is also continuing a project with Cynthia Rudin at Duke University he began before enrolling at Harvard University. The paper is currently titled “The age of secrecy and unfairness in recidivism prediction,” and Beau is the third author, and they are submitting it for review this month. They may write a second paper using recidivism data from Kentucky (IRB completed), but this project is in the early stages.
Academic advisor: Giovanni Parmigiani
Isabella is a first-year student in the Biostatistics PhD Program. Her current research interest is developing scalable Bayesian methods for machine learning, with application to Big Data in health and medical contexts. Isabella plans to take the following courses this academic year: BST230: Probability I, BST232: Methods I, BST267: Introduction to Social and Biological Networks, EPI201: Introduction to Epidemiology, EPI519: Evolutionary Epidemiology of Infectious Disease, HPM548: Responsible Conduct of Research, BST234: Introduction to Data Structures and Algorithms, BST231: Statistical Inference, and COMPSCI205: Computing Foundations for Computational Science. Isabella has also begun her BD biostatistics rotation with Dr. Giovanni Parmigiani which she is developing a Bayesian multi-studyfactor analysismethod.Existingworkiscapableofidentifyinglatentfactorsthatareeither common to all studies under analysis or specific to exactly one study, so her project is to develop a method that can also identify latent factors shared by a subset of studies.
Academic advisor: Jeff Miller
Larry is a first-year student in the Biostatistics PhD program. His current research interest focuses on multi-resolution inference in small and large sample trials, with applications in emergency medicine and drug discovery. The majority of his semester has been spent focusing on coursework. Larry plans to take the following courses this academic year: BST230: Probability I, BST232: Methods I, BST231: Statistical Inference I, BST234: Data Structures and Algorithms, BST273: Introduction to Programming, BST267: Introduction to Social & Biological Networks, BST238: Principles and Advanced Topics in Clinical Trials, RDS280: Decision Analysis for Health and Medicine, ID222: The History of Public Health in the US from the Colonial Era to the Present, and HPM548: Responsible Conduct of Research. During the Fall 2018 semester Larry completed his BD biostatistics rotation with Dr. Xiao-li Meng. Their project was focused on personalized data science through the lens of multi-resolution inference. Larry first conducted a literature review on N-of-1 trials and was connected to a working group of biostatisticians, clinicians, and computer scientists interested in making inferences for individuals. Together they developed an initial research question of interest based on the observation that in real-life, neither the patient nor the physician is usually the sole decision-maker. Rather, patient-physician decisions are typically made together, based on a combination of individual-specific data (small-data) and population-level data (big-data). How do we integrate personal data with population data, which could be well-suited to be modeled in the multi-resolution framework? They are developing a visual tool to help physicians/decision-makers visualize heterogeneity of treatment effect as a function of resolution level. In a dynamic setting, they may be able to “shift” the resolution based on availability/feasibility of combining multiple N-of-1 trials with population-level data. Larry also conducted an exploratory data analysis and is now working with data from the eICU collaborative research database, a multi- center database for critical care research. They are also actively working with Prof Fishman to obtain an appropriate dataset on patients with rare monogenic diseases to test their hypotheses. Larry is now planning his BD computing rotation with Dr. Tianxi Cai in electronic health records, with applications in natural language processing for different languages such as Mandarin Chinese and/or French. This work may use data from biobanks such as the Million Veteran Program or Partners’ Healthcare. In addition to the coursework and rotation work he is planning a manuscript about “Resolution via filtration and decomposition for categorical data”. This would be a statistical methods paper that shows how inference varies at different resolution levels when the outcome of interest is categorical. Larry is currently working on the methods development and is in the early stages of drafting a paper, to be continued through the academic year. He has also taken advantage of the working groups and other professional development activities happening in the department. Larry has attended the Biostatistics PhD Student Seminar Series (including a F31 application primer), and the B3D Seminar Series (Biostatistics-Biomedical Informatics-Big Data), and the Cancer Training Grant Seminar Series and also presented a poster, “Surviving the Weekend Effect: A Competing Risks Analysis of Electronic Medical Records in the UK” at the 2018 Program in Quantitative Genomics (PQG) Conference, Biobanks: Study Design and Data Analysis, Nov 1-2, 2018 that is put on by the department.
Academic advisor: Dr. Doshi Velez
Issac is a second-year student in the Computer Science PhD program. His research interests are developing better explanations for machine learning systems in order to make them useful for healthcare applications. In the last year Issac has done some coursework as well as research. He plans to take the following courses this academic year: HISTSCI146: (How) Does Medicine Work, and COMPSCI223: Probabilistic Analysis and Algorithms. Issac’s work with Finale Doshi-Velez has been on interpretable machine learning and machine learning for healthcare. His research objective is to develop explanations for machine learning models that make them useful for healthcare applications. In the last year, he worked on projects to optimize machine learning models to have interpretable explanations using human feedback, explored the robustness of methods for policy summarization methods to variation in modeling assumptions, and ran a series of human subjects experiments to quantify the effect of various kinds of complexity on interpretability. This year, Issac will work on a project exploring how explanations of machine learning systems can help end-users personalize the predictions they receive from these systems. This is a crucial aspect of using machine learning for clinical decision support because patients often have preferences and constraints not contained in their electronic medical record. Through producing more effective explanations of machine learning systems, he hopes to make machine learning accessible, not only for healthcare domain experts, but also to patients. In December 2018 Issac attended the Neural Information Processing Systems (NeurIPS) 2018 conference in Montreal, Canada to present this work, “Human-in-the- Loop Interpretability Prior”. In addition to the conference proceedings he attended a number of workshops including: Machine Learning for Healthcare and Critiquing and Correcting Trends in Machine Learning. Issac is also working on publications related to this work with Dr. Doshi-Velez. The first, “An Evaluation of the Human-Interpretability of Explanation” with E Chen, J He, M Narayanan, B Kim, S Gershman, and F Doshi- Velez was accepted as a refereed workshop publication, and a full version will soon be submitted to a journal. Issac was involved in all aspects. The second paper, “Evaluating Reinforcement Learning Algorithms in Observational Health Settings,” with O Gottesman, FD Johansson, J Meier, J Dent, D Lee, S Srinivasan, L Zhang, Y Ding, D Wihl, X Peng, J Yao, C Mosch, LH Lehman, M Komorowski, A Faisal, LA Celi, D Dontag, and F Doshi-Velez is under review and Nature Medicine. Issac’s role on this paper was preliminary data analysis.
Academic advisor: Jukka-Pekka Onnela
Harrison is a second-year student in the Biostatistics PhD program. His research interests are in methods for using Big Data sources like databases, smartphones and wearables to understand social and health outcomes. I am also interested in prediction of categorical outcomes, and developing meaningful tools to communicate personalized health predictions. Harrison continues to devote time to his coursework and he plans to take the following courses this academic year: EPI207: Advanced Epidemiologic Methods, BST256: Theory and Methods for Causality I, BST235: Advanced Regression and Statistical Learning, BST245: Multivariate and Longitudinal Data Analysis, BST241: Inference II, and BST261: Data Science II. In addition to his coursework Harrison has completed all of his rotations. In his Big Data biostatistics rotation, he worked with Dr. Sebastien Haneuse to learn about and apply “semi-competing risk” (SCR) survival analysis methods which he developed for use with large medical record data. They collaborated with Dr. Changyu Shen at Beth Israel Deaconess’ Smith Center for Outcomes Research in Cardiology, and examined the outcomes of heart failure patients in a clinical study who had implantable defibrillator devices inserted into their chests. Together they developed a tool for predicting categorical outcomes through time, and are writing a manuscript they plan to submit later this year. His role in this manuscript is analyzing data, and drafting the text. For his Big Data computing rotation, he worked with Dr. Kimberly Glass to investigate approaches for correcting systematic discrepancies that occur in data collected from 304 wearable health devices of subjects in an ongoing study of Inflammatory Bowel Disease (IBD). The ultimate aim of the study is to see how patients’ health experiences and outcomes are connected to their lifestyle, including their sleep patterns. Together they worked on algorithms to correct differences in sleep data across device manufacturer and through time, and are currently developing a new algorithm for detecting and correction.And in his final rotation, BigData health science, he worked with Dr. Jukka-Pekka Onnela to analyze smartphone data from a brain/spine surgical cohort from Dr. Tim Smith of Brigham and Women’s hospital, showing mobility and communication patterns before and after surgery. Together they built longitudinal models for numerous facets of mobility, presented the results at a department symposium, and are planning a manuscript on power calculations for studies involving smartphone data collection. Harrison’s role in this manuscript will be developing an interactive tool for researchers. Harrison also attended the American Heart Association Scientific Sessions 2018 held in Chicago, Illinois from November 10-12, 2018 and presented, “A Joint Shock/Death Risk Prediction Model for Personalized Decision Making in ICD-eligible Patients: Evidence from the SCD-HeFT Trial.”
Academic advisor: John Quackenbush
Matthew is a second-year student in the Biostatistics PhD program. His research interests are in developing new methods for addressing problems in a Big Data context with a particular interest in computational concerns in creating efficient algorithmic implementations of these methods. Matthew continues to focus on his coursework and he plans to take the following courses this academic year: BST235: Advanced Regression and Statistical Learning, BST262: Computing for Big Data, BST256: Theory and Methods for Causality I, EPI207: Advanced Epidemiologic Methods, CS205: Computing Foundations for Computational Science, BST261: Data Science II, BST245: Analysis of Multivariate and Longitudinal Data. In addition to his coursework Matthew has completed two of his three rotations. In his Big Data biostatistics rotation, he worked with Dr. Rafael Irizarry to. The focus of their project was to identify protein binding sites (specifically regarding BRG1 and BAF155) along the human genome using ChIP-seq data on four cell lines. Each cell line consisted of a treatment and control line. The goal was to use hypothesis testing, comparing treatment and control counts of bindings, to identify true binding sites as opposed to noise. Originally, a binomial test was implemented but he has worked on implementing a likelihood ratio test and is currently working on an implementation that allows for over dispersion in the data. Major challenges in carrying out the test include bias in the data and small sample sizes for each individual position along the genome.
Matthew is performing the analysis in R. For his Big Data computing rotation he worked with Dr. Christoph Lange. Matthew prepared for this rotation by learning Python during January 2018 and, during his rotation, worked on implementing a test for genetic homogeneity in populations as a Python module. The analysis was based on that described in “Identification of genetic outliers due to sub-structure and cryptic relationships” by Daniel Schlauch, Heide Fier, and Christoph Lange. The rotation involved many decisions about the proper means by which to store and analyze genetic data in a computationally efficient manner. The final result was a functioning Python module given to Professor Lange. And in his final rotation, Big Data health science, he plans to work with Dr. Kimberly Glass during the Spring 2019 semester. The main goal of this rotation will be to study and establish the existence, or lack thereof, of associations between variables in a mobile-health setting. Namely, there are raw mobile-health data from various devices (FitBit, Garmin, etc.) that have yet to be extensively analyzed but that provide information that will likely be useful for studying or treating Inflammatory bowel disease (IBD) in a clinical setting. The final contribution should include visualizations, a description of any associations (or lack thereof) discovered, and, if necessary, reformatted data that will be more useful for clinical use in treating IBD.