The relative ease with which we can determine genome sequences is making a large impact on how we think of bacterial populations. Indeed, while bacteria excited little interest from population geneticists (with one great exception), they are at the forefront of the emerging field of population genomics.
Our contribution to this includes multiple species and systems, and it is a unifying theme across research projects in the Hanage lab. This page summarizes current thinking and projects.
The study of a previously published dataset of over 600 pneumococcal genomes (also described here, core genome phylogeny shown above) illustrates the insights available from such rich data. It is known that bacteria vary in gene content, and many speak of the ‘core’ and ‘accessory’ genome (I think this can be misleading, and explain why here. but this is a minority view). What is not clear is how that gene content scales with increasing divergence in the core. We found that the relationship is apparently linear, or close to it, and apparently not evenly distributed as shown below.
What is harder to see from this plot is the density of points. The great majority lie in the single dense region in the middle of the plot. In other words, the clusters shown in the phylogeny above are roughly equidistant in their core genome, but also in terms of how much their accessory genomes overlap. The small region of green points at top right indicates an anomalous divergent group of strains (SC12), lacking the major pneumococcal surface antigen and associated with conjunctivitis. In my view, this may be a separate species.
This is merely the most recent such dataset for the pneumococcus. I have also enjoyed a long collaboration with the Wellcome Trust Sanger Institute, considering the evolution of multiple pneumococcal clones. Starting with PMEN-1, we have gone on to consider PMEN-2 and PMEN-14. These are all important intercontinentally distributed clones, recognized and defined by the Pneumococcal Molecular Epidemiology Network.
An important part of working with these bacteria, and many others, is the potential for horizontal gene transfer to confuse analysis. We have worked to develop methods for the analysis of genetic and later genome data, in collaboration with Professor Jukka Corander of the University of Helsinki, in developing software such as Bayesian RecombinAtion Tracker (BRAT – available here), BRATNextGen for genome analyses (available here), Bayesian Analysis of Population Structure (BAPS – available here), and BANANAS for the inference of population trees from molecular markers (available here).
We are interested in the ways in which recombination rates might reflect ecological opportunity. Previously, in collaboration with Prof Sam Sheppard of the University of Swansea we showed two lineages of Camplylobacter jejuni that colonize similar hosts in the agricultural setting. Genomic data from these two lineages show very little recombination between them, despite what appears to be ample opportunity and the fact that under laboratory conditions, recombination is possible. This leads us to suggest that there is some cryptic niche separation between the two lineages.
We are actively interested in other applications of population genomics. With Dr Lipsitch we are investigating the genomic epidemiology of pertussis, the cause of whooping cough, which has been increasing in incidence in the US despite widespread vaccination. We are also working with Dr Gili Regev-Yochay of the Gertner Institute to investigate the molecular epidemiology of community acquired MRSA. With Professor Corander we continue to develop new statistical approaches to the analysis of large genome samples.