Causal Inference from Observational Data

Try explaining to your extended family that you are considered an expert in causal inference. That’s why, when people ask, I just say that my job is to learn what works for the prevention and treatment of diseases.

  • “Oh, so you are a medical doctor?”
  • Yes, but more to the point, I am an epidemiologist. The idea is to save lives in batches rather than one at a time.
  • “Epidemiologist? I see, so you study Covid?”
  • Yes, I do, among other things. Epidemiologists study all sorts of health issues: cancer, cardiovascular disease, birth defects, suicide… In fact, most epidemiologists don’t study epidemics of infectious diseases.”
  • “Most epidemiologists don’t study epidemics?”

Here we go again. This paper of ours may help explain some components of my job.

Even when talking with professional researchers, we causal inferencers are up against some serious misunderstandings. On a few occasions, I have been invited to deliver lectures on “casual interference.” Less amusingly, a crucial problem is that many researchers are reluctant to acknowledge that their job involves asking causal questions, as if they were ashamed of using the c-word:

Making valid causal inferences is challenging because it requires high-quality data and adequate statistical methods. My colleague Jamie Robins and I wrote a book that describes these methods and the conditions under which they can be used, which means that investigators need to be subject-matter experts to evaluate whether those conditions are met. That is, when trying to make causal inferences from observational data, it isn’t enough to be a brilliant data analyst; you also need to be a subject-matter expert. We explain here:

Subject-matter knowledge is needed not only to answer causal questions, but also to ask them. A current debate is about which causal questions can and cannot be asked. Some of us argue that some causal questions that are often taken for granted (like “what is the effect of obesity on mortality?”) aren’t good scientific questions. Once we explained this problem by telling the story of a king who wanted the best for his subjects. For a more comprehensive discussion of this topic (and many references), take a look at

If you read the above papers, you will notice a recurrent idea: causal inference from observational data can be viewed as an attempt to emulate a (hypothetical) randomized trial: the target trial. (For more on the history of this idea, see this). We wrote some non-technical papers that review the concept of the target trial and explain how it can be used to avoid some common biases in observational analyses:

Immortal time bias is one of the biases that can be eliminated by explicitly emulating a target trial. The first paper listed below teaches how to prevent immortal time bias with a very very simple example; the second explains the issues in more detail:

Another bias that is eliminated by emulating a target trial is the bias that appears when data analysts set the time zero of treatment months or years after a treatment strategy was initiated, that is, they use “prevalent users”. This bias, which we have documented in several settings (like here), is the most important bias you may have never heard of. For example, this bias played a key role in the debacle surrounding postmenopausal hormone therapy and heart disease. If you have ever heard me give a talk, chances are that I presented these findings:

  • Hernán MA, Alonso A, Logan R, Grodstein F, Michels KB, Willett WC, Manson JE, Robins JM. Observational studies analyzed like randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease (with discussion). Epidemiology 2008; 19(6):766-779. PMCID: PMC3731075

This article shows how incorrect specification of time zero can lead to misleading conclusions. Also, the story of this paper is somewhat unique in the scientific literature: Because the language of the paper was the result of complex negotiations among the co-authors over several years, nobody was completely happy with the final version (one of the co-authors even dropped his name from the paper after the journal accepted it). Therefore, some of the co-authors took the unprecedented step of writing separate commentaries about their own article, with some of the commentaries being quite critical! Jamie Robins and I viewed this as an opportunity to explain the contributions of the paper in a more uncompromising way:

A couple of years later, I revisited a related issue here:

(Along similar lines, my colleague Mats Stensrud and I later argued that testing for proportional hazards is pointless and should be abandoned:

But I digress. Back to time zero.)

Other colleagues remained unconvinced about the virtues of setting time zero of follow-up at the start of the treatment strategy, which is what one would do when explicitly emulating a target trial. In my response to them, I explained why setting time zero at the start should be the default approach, and argued that deviations from this approach need to be carefully justified on a case-by-case basis:

The target trial concept is also central to the Cochrane Collaboration’s ROBINS-I tool to assess risk of bias in observational studies. Speaking of risk of bias, if you are interested in our research to classify biases according to its structure, click here.