The Statistical and Quantitative Training in Big Data Health Science Program will focus on training quantitative Big Data scientists who have a firm grounding in statistical theory and applications as well as a grasp of the computational methods necessary to implement these in a Big Data setting. With the goal of training tomorrow’s leaders in this area, we are proposing the following major activities: (1) integrated and interdisciplinary formal coursework; (2) three lab rotations involving Big Data (BD): BD biostatistics, BD computing, and a BD health science research. For trainees interested in genomics, their second lab rotation can be in computational biology and their third lab rotation can be a wet lab rotation; (3) dissertation research; (4) participation in interdisciplinary research projects; (5) training in leadership and communication skills; and (6) organization and participation in retreats, seminars, tutorials, and conferences. Trainees will be expected to finish their lab rotations in the first two years and coursework in the first three years; our goal is to support each trainee in the first and second years of his/her graduate study and to assist in transitioning to research with a faculty advisor who can provide financial support for their dissertation projects.

The training grant director is John Quackenbush. Francesca Dominici, Rafael Irizarry, and Xihong Lin from the Department of Biostatistics at the Harvard T.H. Chan School of Public Health and David Parkes from the Department of Computer Science, School of Engineering and Applied Science at Harvard University will serve as the Associate Directors of this training grant. This group was selected based on their demonstrated expertise in the use of Big Data in a wide range of applications, including in health and biomedical research and in computer science.

Stipend and tuition support for this training program is funded through a National Institutes of Health grant (T32 LM012411).