Category Archives: Publications

Application note: CONTEXT, a Phylogenomic Dataset Browser

In prep. (v3 – 14 Jun 2017)

Summary. The CONTEXT (COmparative Nucleotides and Trees Exploration Tool) is a phylogenomics dataset browser that consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution.

Motivation. Comparative genomics studies have become increasingly common, but these analyses are sensitive to the quality and heterogeneity of input datasets (multiple sequence analyses and phylogenies). Currently few tools exist to readily compute descriptive statistics, or to visualise large numbers of input datasets. CONTEXT facilitates these analyses in a lightweight application which allows any user to rapidly visualise, inspect, score, and sort input datasets to identify outlying datasets which may need additional processing or filtering.

Results. The application has been successfully implemented on a variety of infrastructures. A variety of common input data formats including FASTA, Phylip/PAML, Nexus, and Newick conventions are automatically read and parsed.


Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):


  • v3 (14/07/2017): .pdf
  • v2 (03/04/2017): .pdf
  • v1 (24/02/2015): .doc
  • View this project on GitHub

Detection of molecular convergence – literature review

In prep. (v2 – 21 April 2015)


Convergent evolution is a process by which neutral evolutionary processes and adaptive natural selection in response to niche specialisation lead to similar forms arising in unrelated taxa. Phenotypic convergence has been appreciated for well over a century (recognised as a confounding factor in morphological cladistics). Recently several studies have demonstrated that convergent-type signals exist in some molecular datasets. Extending these studies to genome scale data presents substantial challenges and opportunities. This chapter reviews the definition of convergence (compared to parallelism), and the biological interpretation of apparently convergent molecular data. Recent methodological developments and applications are examined and future problems outlined. These include suitable null and alternative models, and the role of multiple test phylogenies in convergence detection by the congruence / phylogeny support method.


Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):


  • v1 (10/04/2015): .doc
  • v2 (21/04/2015): .doc

Application note: the Genomic Convergence Detection Pipeline

In prep. (v0 – 24 February 2015)

Summary. Genome Convergence Pipeline consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution.

Motivation. Although convergent phenotypes are readily observed in nature evidence that evolution can produce convergent signals in genetic sequences have only recently emerged. The Genome Convergence Pipeline facilitates these analyses.

Results. The application has been successfully implemented on a variety of infrastructures.


Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):


  • v0 (24/2/2015): .doc
  • View this project on GitHub

Interpreting ‘tree space’ in the context of very large empirical datasets

Seminar presented at the Maths Department, University of Portsmouth, 19th November 2014

Evolutionary biologists represent actual or hypothesised evolutionary relations between living organisms using phylogenies, directed bifurcating graphs (trees) that describe evolutionary processes in terms of speciation or splitting events (nodes) and elapsed evolutionary time or distance (edges). Molecular evolution itself is largely dominated by mutations in DNA sequences, a stochastic process. Traditionally, probabilistic models of molecular evolution and phylogenies are fitted to DNA sequence data by maximum likelihood on the assumption that a single simple phylogeny will serve to approximate the evolution of a majority of DNA positions in the dataset. However modern studies now routinely sample several orders of magnitude more DNA positions, and this assumption no longer holds. Unfortunately, our conception of ‘tree space’ – a notional multidimensional surface containing all possible phylogenies – is extremely imprecise, and similarly techniques to model phylogeny model fitting in very large datasets are limited. I will show the background to this field and present some of the challenges arising from the present limited analytical framework.

Slides [SlideShare]: cc-by-nc-nd

Our Nature paper! Genome-wide molecular convergence in echolocating mammals

Exciting news from the lab this week… we’ve published in one of the leading journals, Nature!!!

Much of my work in the Rossiter BatLab for the last couple of years has centred around the search for genomic signatures of molecular convergence. This means looking for similar genetic changes in otherwise unrelated organisms. We’d normally expect unrelated organisms to differ considerably in their genetic sequences, because over time random mutations occur in their genomes; the more time has passed since two species diverged, the more changes we expect. However, we know that similar structures may evolve in unrelated species due to shared selection pressures (think of the streamlined body shapes of sharks, icthyosaurs and dolphins, for example). Can these pressures produce identical changes right down at the level of genetic sequences? We hoped to detect identical genetic changes in unrelated species (in this case, the echolocation – ‘sonar hearing’ – in some species of bats and whales) caused by similar selection pressures operating on the evolution of the genes required for those traits.

It’s been a long slog – we’ve had to write a complicated computer program to look at millions of letters of DNA – but this week it all bears fruit. We found that a <em>staggering</em> number of genes in the genomes of echolocating bats and whales (a bottlenose dolphin, if you must) showed evidence of these similar genetic changes, known technically as ‘genetic convergence’.

Obviously we started jumping up and down when we found this, and because we imagined other scientists would too, we wrote up our findings and sent them to the journal <em>Nature</em>, one of the top journals in the world of science… and crossed our fingers.

Well, today we can finally reveal that we were able to get through the peer-review process (where anonymous experts scrutinise your working – a bit like an MOT for your experiments), and the paper is out today!

But what do we actually say? Well:
<blockquote>Evolution is typically thought to proceed through divergence of genes, proteins and ultimately phenotypes. However, similar traits might also evolve convergently in unrelated taxa owing to similar selection pressures. Adaptive phenotypic convergence is widespread in nature, and recent results from several genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level. Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution, although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show that convergence is not a rare process restricted to several loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four newly sequenced bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the bottlenose dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Unexpectedly, we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognized.</blockquote>
Congrats to Steve, Georgia and Joe! After a few deserved beers we’ll have our work cut out to pick through all these genes and work out exactly what all of them do (guessing the genes’ biological functions, especially in non-model (read:not us or things we eat) organisms like bats and dolphins is notoriously tricky. So we’ll probably stick our heads out of the lab again in <em>another</em> two years…

The full citation is: Parker, J., Tsagkogeorga, G., Cotton, J.A.C., Liu, R., Stupka, E., Provero, P. &amp; Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. <em>Nature</em> (epub ahead of print), 4th September 2013. doi:10.1038/nature12511. This work was funded by Biotechnology and Biological Sciences Research Council (UK) grant BB/H017178/1.


The mode and tempo of hepatitis C virus evolution within and among hosts.

BMC Evol Biol. 2011 May 19;11(1):131. [Epub ahead of print]

Gray RR*, Parker J*, Lemey P, Salemi M, Katzourakis A, Pybus OG.

*These authors contributed equally to this article.


Hepatitis C virus (HCV) is a rapidly-evolving RNA virus that establishes chronic infections in humans. Despite the virus’ public health importance and a wealth of sequence data, basic aspects of HCV molecular evolution remain poorly understood. Here we investigate three sets of whole HCV genomes in order to directly compare the evolution of whole HCV genomes at different biological levels: within- and among-hosts. We use a powerful Bayesian inference framework that incorporates both among-lineage rate heterogeneity and phylogenetic uncertainty into estimates of evolutionary parameters.


Most of the HCV genome evolves at ~0.001 substitutions/site/year, a rate typical of RNA viruses. The antigenically-important E1/E2 genome region evolves particularly quickly, with correspondingly high rates of positive selection, as inferred using two related measures. Crucially, in this region an exceptionally higher rate was observed for within-host evolution compared to among-host evolution. Conversely, higher rates of evolution were seen among-hosts for functionally relevant parts of the NS5A gene. There was also evidence for slightly higher evolutionary rate for HCV subtype 1a compared to subtype 1b.


Using new statistical methods and comparable whole genome datasets we have quantified, for the first time, the variation in HCV evolutionary dynamics at different scales of organisation. This confirms that differences in molecular evolution between biological scales are not restricted to HIV and may represent a common feature of chronic RNA viral infection. We conclude that the elevated rate observed in the E1/E2 region during within-host evolution more likely results from the reversion of host-specific adaptations (resulting in slower long-term among-host evolution) than from the preferential transmission of slowly-evolving lineages.

Molecular epidemiology and phylogeny reveals complex spatial dynamics of endemic canine parvovirus.

J Virol. 2011 May 18. [Epub ahead of print]

Clegg SR, Coyne KP, Parker J, Dawson S, Godsall SA, Pinchbeck G, Cripps PJ, Gaskell RM, Radford AD.

Canine parvovirus 2 (CPV-2) is a severe enteric pathogen of dogs, causing high mortality in unvaccinated dogs. After emerging, CPV-2 spread rapidly worldwide. However, there is now some evidence to suggest that international transmission appears to be more restricted. In order to investigate the transmission and evolution of CPV-2 both nationally and in relation to the global situation, we have used a long range PCR to amplify and sequence the full VP2 gene of 150 canine parvoviruses obtained from a large cross-sectional sample of dogs presenting with severe diarrhoea to veterinarians in the UK, over a two year period. Amongst these 150 strains, 50 different DNA sequence types were identified, and apart from one case, all appeared unique to the UK. Phylogenetic analysis provided clear evidence for spatial clustering at the international level, and for the first time also at the national level, with the geographical range of some sequence types appearing to be highly restricted within the UK. Evolution of the VP2 gene in this dataset was associated with a lack of positive selection. In addition, the majority of predicted amino acid sequences were identical to those found elsewhere in the world, suggesting CPV VP2 has evolved a highly fit conformation. Based on typing systems using key amino acid mutations, 43% of viruses were CPV 2a, 57% CPV 2b, with no type 2 or 2c found. However phylogenetic analysis suggested complex antigenic evolution of this virus, with both type 2a and 2b viruses appearing polyphyletic. As such, typing based on specific amino acid mutations may not reflect the true epidemiology of this virus. The geographical restriction we observed both within the UK, and between the UK and other countries, together with the lack of CPV-2c in this population, strongly suggest the spread of CPV within its population may be heterogeneously subject to limiting factors. This cross-sectional study of national and global CPV phylogeographic segregation reveals a substantially more complex epidemic structure than previously described.