HSPH Secures BD2K Training Grant

From contributor Sheila Gaynor

We are very pleased to announce that our department was recently awarded an NIH Big Data to Knowledge (BD2K) T32 Training Grant that will support six PhD students in Biostatistics and Computer Science as they take courses and conduct research focused on the use of Big Data to address important issues in health and biomedical research. The BD2K initiative is a long-term program at the National Institutes of Health to support advances in quantitative science and data science to assure effective use of Big Data in biomedical research. And when the BD2K announced its T32 grant program, our department was particularly well prepared to respond. For this Training Grant, John Quackenbush serves as the PI, Francesca Dominici, Rafa Irizarry, Xihong Lin, and David Parkes, Area Dean of Computer Science in the Harvard John A Paulson School of Engineering and Applied Sciences, serve as Associate Directors. We are excited about this new partnership with CS.

The past few years has seen a substantial investment in our department to assure that our students receive not only the most rigorous training in statistical methodology possible, but that they also have the necessary skills to take advantage of the rapidly changing sources of data we now have available. In addition to the growing number of funded research projects that involve methods development and analysis of massive data, our department has developed a number of educational initiatives that build on our excellence in quantitative science. The past few years have seen the launch of the Computational Biology and Quantitative Genetics MS program (developed jointly with Epidemiology), our Health Data Science MS program (just approved), a number of courses being taught jointly with statistics and computer science, and the growing emphasis on computational training across our degree programs. The BD2K training grant application was developed to build on this investment, as well as the investments being made across Harvard in modern analytical and quantitative training.   

One of the distinguishing aspects of our department is the breadth of training grants we have been able to secure over the years,” said John Quackenbush, the Director of the BD2K Training Grant. “Our outstanding record of world-leading research in quantitative health science and our demonstrated excellence in training made it easy to develop a program that would address the emerging area of quantitative data science in health and biomedical research. While I am the PI of this award, it really recognizes the entirety of our department.

This week we will begin a two-part interview series with the investigators of this new training grant. This week we interview David Parkes, the George F. Colony Professor of Computer Science. David Parkes and the Department of Computer Science provide expertise in a key component in working with big data: effective, high performance computational approaches.


You are an Area Dean of Computer Science and Associate Director of the Big Data training grant. How are biostatistics and CS related and how does this training grant build a stronger connection between them?

Computer science will continue to play a critical role in advancing our ability to work easily with data and make scientific advances, especially with the profusion of very large and heterogeneous data sets. Computer science is an extremely broad field and areas of intersection include at least those of visualization, human-computer interaction, data-management systems, algorithms, machine learning and artificial intelligence. I hope that through this training grant there will be renewed attention to technical problems that are motivated by medical science and health care.

What are some interesting problems in health sciences that you or other faculty in the computer science department are working on? 

Finale Doshi-Velez is working on models and algorithms for inference in regard to the kinds of sequential data that arises when working with electronic health records, looking for example to model disease evolution. This is part of a broad research agenda to develop individualized treatment policies to improve healthcare. Krzysztof Gajos and Barbara Grosz are working on the design of collaborative, multi-agent system technologies to improve coordination and teamwork in the delivery of healthcare by distributed, diverse teams of caregivers.

What aspects of computer science training do you think are most important to bring to biostatistics students via the Big Data training grant?

An exposure to the toolkit of modern machine learning, including unsupervised, supervised and reinforcement learning. A working understanding of modern data systems including distributed databases, key value stores, and scientific data management. An appreciation of the current state of artificial intelligence and its research frontier. Human-computer interaction, especially the importance of careful design in enabling useful data-driven systems. Differential privacy. Internet of things and networked devices, broadly. Visualization tools.

What types of new careers and experiences do you envision this training grant prepare students for?

Careers that will impact domestic and global health through the new knowledge and scientific and policy breakthroughs that can come from statistically-valid and computationally-enabled data-driven discovery.