Outbreaks, and who infected whom?

In the summer of 2011, an outbreak of foodborne E. coli swept across northern Europe. The strain causing disease was called O104:H4 and produced Shiga toxin as well as enteroaggregative factors that went some way to explaining its high virulence and association with renal complications. Over 50 lost their lives in this outbreak and thousands were sickened. We investigated it through a collaboration with the Broad Institute of Harvard and MIT, and the Dr Francois-Xavier Weill of the Institut Pasteur in Paris France. Dr Weill and colleagues had noted an outbreak of extremely similar disease associated with a school in Bordeaux, which subsequently transpired to be linked to the much larger outbreak centered in Germany. Sequencing the genomes of bacteria from the two outbreaks, to a resolution where we were able to confidently detect 1 difference in a genome of more than 5 million base pairs, we found the small Bordeaux outbreak to be much more diverse.

Phylogeny generated from genomes of E. coli isolates from the large and small outbreaks. Isolates from the larger outbreak are nested within the smaller one. Note the multiple isolates from a single individual, in which just one SNP was identified in ~5 million base pairs
Phylogeny generated from genomes of E. coli isolates from the large and small outbreaks. Isolates from the larger outbreak are nested within the smaller one. Note the multiple isolates from a single individual, in which just one SNP was identified in ~5 million base pairs. Figure downloaded from Grad et al 2012, at Pubmed Central

The reasons for this remain unknown – but notably we found that isolates from the same individual are extremely closely related, and hence the low diversity in the larger outbreak might represent passage through a worker at the sprout farm linked to the outbreak, at which asymptomatic infections were indeed documented. Of further relevance to public health is the observation that comparing outbreak strains with sporadic cases of o104:H4 disease from France, we found them to be extremely closely related, although it was clear that the sporadic cases did not derive from the outbreak. Notably the sporadic cases had a full complement of virulence loci, including Shiga and other toxins, underlining the potential for another outbreak. This work was led by Yonatan Grad, now a faculty member in Immunology and Infectious Diseases.

These results might lead us to think that if bacteria from two individuals are similar or nearly identical we can infer a transmission link. But this is mistaken. The figure below illustrates the problem, representing the genetic similarity between strains using color  – more similar colors have more similar genomes. If diversity within the host accumulates, or is transmitted, then sequencing one isolate may produce data not representative of the whole.

Schematic illustrating the accumulation of diversity. A  tight bottleneck of just one infectious particle initiates the infection. As it grows it diversifies, shown as different colors. The color the particle that is then transmitted may not be representative. If we sequence one isolate of a diverse population, it too may not be representative.
Schematic illustrating the accumulation of diversity. A tight bottleneck of just one infectious particle initiates the infection. As it grows it diversifies, shown as different colors. The color the particle that is then transmitted may not be representative. If we sequence one isolate of a diverse population, it too may not be representative.

Work by Colin Worby, in collaboration with Marc Lipsitch, has simulated outbreaks and transmission chains and found that in many cases it is hard or impossible to simply reconstruct transmission – especially when we sequence one strain from each patient. This is easy to understand intuitively: to be able to say anything useful, some variation has to arise in the patient and be transmitted (if it does not, all cases will be identical and we will have no information with which to construct a transmission chain). The amount of variation transmitted depends on the bottleneck size. If this is small, then with time there is a probability that the onward infection is not be representative of the infection that seeded it. And if large, diversity will accumulate in more than one host and, over time, we will end up selecting and sequencing an isolate that is atypical of the population.

Results of a simulation showing the difficulty in reconstructing a transmission chain from single isolates. In all cases the true transmission route is from the red index case counter-clockwise. Blue lines represent inferred transmission, with better supported links shown with heavier weights. Different bottlenecks sizes and sampling density are considered and in most cases perform poorly. For  more details see Worby et al 2014.
Results of a simulation showing the difficulty in reconstructing a transmission chain from single isolates. In all cases the true transmission route is from the red index case counter-clockwise. Blue lines represent inferred transmission, with better supported links shown with heavier weights. Different bottlenecks sizes and sampling density are considered and in most cases perform poorly. For more details see Worby et al 2014.

Extending this, we have computed the expected distribution of pairwise distances between genomes in an outbreak. We are actively working to improve the estimation of transmission networks, and consider more complicated scenarios.

Other examples of genomic methods in outbreak analysis include the study of the cholera outbreak that devastated Haiti following a magnitude 7 earthquake in 2010. Prior to this cholera was unknown on the island of Hispaniola. Concerns were raised that the disease had been inadvertently introduced by Nepalese UN peacekeepers, who had set up camp close to the initial cases. With Dr Cheryl Tarr of the Enteric Diseases Epidemiology Branch of CDC we used genomic methods, coupled with coalescent analyses, to estimate the rate with which the outbreak was growing and the date at which all outbreak strains sequenced had shared a common ancestor. This was found to be consistent with the hypothesis of Nepalese origin. This work, published in mBio, was awarded the 2014 Nakano citation.

The lab continues to investigate ongoing outbreaks and Dr Hanage collaborates with the Broad Institute, where he is an associate member, in the study of Carbapenem resistant Enterobacteriaceae. These bacteria are increasingly common, and an increasing cause of morbidity and mortality in the healthcare setting. In the worst cases, they are refractory to all antibiotic treatment, and as such present a marked threat to public health.