Population genomics

The relative ease with which we can determine genome sequences is making a large impact on how we think of bacterial populations. Indeed, while bacteria excited little interest from population geneticists (with one great exception), they are at the forefront of the emerging field of population genomics.

Our contribution to this includes multiple species and systems, and it is a unifying theme across research projects in the Hanage lab. This page summarizes current thinking and projects.

The colors indicate the Sequence Clusters (SCs) as identified by BAPS, and within them shading distinguishes variants with different serotypes. Figure from Croucher et al 2013.
Core genome phylogeny of over 600 pneumococci sampled from Massachusetts children. The colors indicate the Sequence Clusters (SCs) as identified by BAPS, and within them shading distinguishes variants with different serotypes. Figure from Croucher et al 2013.

The study of a previously published dataset of over 600 pneumococcal genomes (also described here, core genome phylogeny shown above) illustrates the insights available from such rich data. It is known that bacteria vary in gene content, and many speak of the ‘core’ and ‘accessory’ genome (I think this can be misleading, and explain why here. but this is a minority view). What is not clear is how that gene content scales with increasing divergence in the core. We found that the relationship is apparently linear, or close to it, and apparently not evenly distributed as shown below.

Data are the comparison of genomes in the phylogeny above. Figure from Croucher et al 2014.
Data are the comparison of genomes in the phylogeny above. Figure from Croucher et al 2014.

What is harder to see from this plot is the density of points. The great majority lie in the single dense region in the middle of the plot. In other words, the clusters shown in the phylogeny above are roughly equidistant in their core genome, but also in terms of how much their accessory genomes overlap. The small region of green points at top right indicates an anomalous divergent group of strains (SC12), lacking the major pneumococcal surface antigen and associated with conjunctivitis. In my view, this may be a separate species.

This is merely the most recent such dataset for the pneumococcus. I have also enjoyed a long collaboration with the Wellcome Trust Sanger Institute, considering the evolution of multiple pneumococcal clones. Starting with PMEN-1, we have gone on to consider PMEN-2 and PMEN-14. These are all important intercontinentally distributed clones, recognized and defined by the Pneumococcal Molecular Epidemiology Network.

An important part of working with these bacteria, and many others, is the potential for horizontal gene transfer to confuse analysis. We have worked to develop methods for the analysis of genetic and later genome data, in collaboration with Professor Jukka Corander of the University of Helsinki, in developing software such as Bayesian RecombinAtion Tracker (BRAT – available here), BRATNextGen for genome analyses (available here), Bayesian Analysis of Population Structure (BAPS – available here), and BANANAS for the inference of population trees from molecular markers (available here).

We are interested in the ways in which recombination rates might reflect ecological opportunity. Previously, in collaboration with Prof Sam Sheppard of the University of Swansea we showed two lineages of Camplylobacter jejuni that colonize similar hosts in the agricultural setting. Genomic data from these two lineages show very little recombination between them, despite what appears to be ample opportunity and the fact that under laboratory conditions, recombination is possible. This leads us to suggest that there is some cryptic niche separation between the two lineages.

 

The genomes of the light blue and red lineages show hardly any evidence of recombination, clear in the network shown in C, even though they are both generalists (B) and there is good evidence they recombine with specialist lineages (again, see C). Figure from Sheppard et al 2014.
The genomes of the light blue and red lineages show hardly any evidence of recombination, clear in the network shown in C, even though they are both generalists (B) and there is good evidence they recombine with specialist lineages (again, see C). Figure from Sheppard et al 2014.

We are actively interested in other applications of population genomics. With Dr Lipsitch we are investigating the genomic epidemiology of pertussis, the cause of whooping cough, which has been increasing in incidence in the US despite widespread vaccination. We are also working with Dr Gili Regev-Yochay of the Gertner Institute to investigate the molecular epidemiology of community acquired MRSA. With Professor Corander we continue to develop new statistical approaches to the analysis of large genome samples.