Department History: A long-term view of biostatistics discovery by Stuart G. Baker

Stuart BakerIt was the summer of 1983 at Harvard Biostatistics. Marvin Zelen was Department Chair.  As graduate students, Nick Lange and I had designed a Harvard Biostatistics t-shirt which depicted a Cauchy distribution with the phrase “BEYØND ALL E(Xp)ECTATIØNS: βIOSTATISTICS.”  Also, that summer, the Department of Biostatistics organized the student-versus-faculty softball game. My friend Karen was visiting me and was the student-team pitcher, by virtue of her having played on her college team.  Marvin Zelen was the pitcher for the faculty.

A few years previously Marvin Zelen had proposed the randomized consent design. In this design, investigators randomize patients to group 1 with only treatment A or to group 2 with a choice between treatment A or treatment B. This design appeals to both clinicians and patients.  The statistical challenge was the mixture of participants receiving A and B  in group 2, which dilutes the effect and complicates the interpretation.  I was interested modeling and thinking about hypothetical scenarios.  My key idea was classifying patients as a type who would choose A if offered and a type who would choose B if offered.  Much later these types were called principal strata.

With this formulation, I obtained a maximum likelihood estimate of the difference in outcome between A and B among patients who would choose B if offered. This was an early version of LATE (discussed) below.  I wrote up this idea as a draft manuscript.  Although I thought it was a nice result, it did not affect the power of the trial.  As I did not receive any encouragement from my mentors, I did not try to publish it.

Fast forward to 1994.  I extended the approach to a more interesting case in which group 1 also chooses between A and B but with different probabilities than in group 2.  I did not have a randomized trial example, but Karen (now my wife) suggested an application from her field of obstetric anesthesiology that involved changes in availability of epidural analgesia in a before-and-after study. That same year, unbeknownst to me, economists Guido Imbens and Joshua Angrist published a paper on the local average treatment effect (LATE). This was the same basic formulation as my 1994 paper with Karen.  The key to solving this more interesting case was assuming no irrational choices.

The paper by Imbens and Angrist involved a randomized trial, continuous outcomes, and economics.  Therefore, it caught much more interest than my 1994 paper with Karen, which involved a set of before-and-after studies, a binary outcome, and biostatistics.  Once a paper becomes the “go-to” citation, there is no incentive for others to find connections with contemporaneous or earlier work.  In 2021, the LATE formulation helped Imbens and Angrist win the Nobel Prize in Economics.

In a just published paper in Chance magazine “Multiple discoveries in causal inference: LATE for the party,” Karen and I discuss many parallel histories involving the discovery of LATE.  Although Karen and I did not get a trip to Stockholm, we had the satisfaction of knowing that the idea we had nurtured for years had a substantial impact on a wide range of fields.

I still have my 1983 paper with the early version of LATE.  For those interested, a copy of the paper is available in an online supplement to Baker SG, Kramer BS, Lindeman KS. Latent class instrumental variables: a clinical and biostatistical perspective Stat Med. 2016  I am proud that a graduate student’s creative idea and instinct that the results were important (even though I did not publish it) were on the mark.

If there is any lesson to be learned from this that I could impart to others, it is to trust yourself. You should share your work with others and get feedback.  However, even if others are not enthusiastic, if you think you are on the right track and no one has found a serious flaw, then keep at it.