Zack McCaw Searches for Signals in Big Data’s Noise

Zachary McCaw, one of this year’s Biostats graduates who was recently profiled by the School, says his time at Harvard Chan School made him a better problem solver, and has impressed on him the importance of seeing research as an iterative process. “I learned that a new method will seldom work as expected the first time through, and you will almost always benefit from multiple rounds of experimentation and revision,” he said.

For his dissertation, Zach applied his perseverance to genome-wide association studies using UK Biobank data to determine whether inverse normal transformation could help researchers more reliably identify locations in the genome associated with lung function. In a second project, he proposed a way to fill in gaps in data that are difficult to measure by borrowing information from more easily measured surrogate data – in this case, gene expression in blood as a stand-in for gene expression in the brain. The goal of these projects, which were supported by an F31 Individual Research Fellowship from the National Heart Lung and Blood Institute, was to aid research on genetic variants that could ultimately lead to better diagnostic procedures and targeted treatments.

McCaw will work over the summer at the Broad Institute on the problem of fine-mapping, or determining which of the many genetic variants in a region of the genome is truly responsible for affecting a health outcome. This fall he will join Google as a data scientist, working on developing causal inferences from longitudinal data. But eventually he sees himself returning to academia, in a role that involves both biomedical research and teaching. Ultimately, he hopes to continue working on interesting statistical problems, “and when I am able to propose effective new solutions, to share them with the broader community.”