Category Archives: Publications

Application note: ‘Befi-BaTS’ version 0.10.1 – Error rate and statistical power of distance-based measures of phylogeny-trait association.

In prep.

SUMMARY

Building on work presented previously (Parker et al., 2008), we study a number of more complex measures of phylogeny-trait association (implemented in the program Befi-BaTS / BaTS v0.10.1) which take into account the branch lengths of a phylogenetic tree in addition to the topographical relationship between taxa. Extensive simulation is performed to measure the Type II error rate (statistical power) of these statistics including those introduced in Parker et al. (2008), as well as the relationship between power and tree shape. The technique is applied to an empirical hepatitis C virus data set presented by Sobesky et al. (2007); their original conclusion that compartmentalization exists between viruses sampled from tumorous and non-tumorous cirrhotic nodules and the plasma is upheld. The association index (AI), migration (PS), phylodynamic diversity (PD) and unique fraction (UF) statistics offer the best combination of Type I error and statistical power to investigate phylogeny-trait association in RNA virus data, while the maximum monophyletic clade size (MC) and nearest taxon (NT) statistics suffer from reduced power in some regions of tree space.

Keywords: BaTS, hepatitis C virus, Markov-chain Monte Carlo, Phylogeny-trait association, Phylogenetic uncertainty, simulation.

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

  • v1: (): .doc
  • v2 (01/01/2014): .docx
  • v3 (16/06/2017): .pdf
  • View this project on GitHub

 

Application note: CONTEXT, a Phylogenomic Dataset Browser

In prep. (v3 – 14 Jun 2017)

Summary. The CONTEXT (COmparative Nucleotides and Trees Exploration Tool) is a phylogenomics dataset browser that consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution.

Motivation. Comparative genomics studies have become increasingly common, but these analyses are sensitive to the quality and heterogeneity of input datasets (multiple sequence analyses and phylogenies). Currently few tools exist to readily compute descriptive statistics, or to visualise large numbers of input datasets. CONTEXT facilitates these analyses in a lightweight application which allows any user to rapidly visualise, inspect, score, and sort input datasets to identify outlying datasets which may need additional processing or filtering.

Results. The application has been successfully implemented on a variety of infrastructures. A variety of common input data formats including FASTA, Phylip/PAML, Nexus, and Newick conventions are automatically read and parsed.

 

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

 

  • v3 (14/07/2017): .pdf
  • v2 (03/04/2017): .pdf
  • v1 (24/02/2015): .doc
  • View this project on GitHub

Detection of molecular convergence – literature review

In prep. (v2 – 21 April 2015)

Abstract

Convergent evolution is a process by which neutral evolutionary processes and adaptive natural selection in response to niche specialisation lead to similar forms arising in unrelated taxa. Phenotypic convergence has been appreciated for well over a century (recognised as a confounding factor in morphological cladistics). Recently several studies have demonstrated that convergent-type signals exist in some molecular datasets. Extending these studies to genome scale data presents substantial challenges and opportunities. This chapter reviews the definition of convergence (compared to parallelism), and the biological interpretation of apparently convergent molecular data. Recent methodological developments and applications are examined and future problems outlined. These include suitable null and alternative models, and the role of multiple test phylogenies in convergence detection by the congruence / phylogeny support method.

 

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

 

  • v1 (10/04/2015): .doc
  • v2 (21/04/2015): .doc

Application note: the Genomic Convergence Detection Pipeline

In prep. (v0 – 24 February 2015)

Summary. Genome Convergence Pipeline consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution.

Motivation. Although convergent phenotypes are readily observed in nature evidence that evolution can produce convergent signals in genetic sequences have only recently emerged. The Genome Convergence Pipeline facilitates these analyses.

Results. The application has been successfully implemented on a variety of infrastructures.

 

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

 

  • v0 (24/2/2015): .doc
  • View this project on GitHub

Interpreting ‘tree space’ in the context of very large empirical datasets

Seminar presented at the Maths Department, University of Portsmouth, 19th November 2014

Evolutionary biologists represent actual or hypothesised evolutionary relations between living organisms using phylogenies, directed bifurcating graphs (trees) that describe evolutionary processes in terms of speciation or splitting events (nodes) and elapsed evolutionary time or distance (edges). Molecular evolution itself is largely dominated by mutations in DNA sequences, a stochastic process. Traditionally, probabilistic models of molecular evolution and phylogenies are fitted to DNA sequence data by maximum likelihood on the assumption that a single simple phylogeny will serve to approximate the evolution of a majority of DNA positions in the dataset. However modern studies now routinely sample several orders of magnitude more DNA positions, and this assumption no longer holds. Unfortunately, our conception of ‘tree space’ – a notional multidimensional surface containing all possible phylogenies – is extremely imprecise, and similarly techniques to model phylogeny model fitting in very large datasets are limited. I will show the background to this field and present some of the challenges arising from the present limited analytical framework.

Slides [SlideShare]: cc-by-nc-nd

[slideshare id=41858965&doc=joeparker-multiplephylogenies-141121092909-conversion-gate02]

Phylogenomic convergence detection: lessons and perspectives

Talk presented at the 18th Evolutionary Biology Meeting At Marseille (programme), 16th-19th September 2014.

(Powerpoint – note this is a draft, not the final talk, pending authorisation): EBMdraft

[slideshare id=41517262&doc=ebmjoeparkerconvergencefinal-recover-nosampling-141113102943-conversion-gate01]

Our Nature paper! Genome-wide molecular convergence in echolocating mammals

Exciting news from the lab this week… we’ve published in one of the leading journals, Nature!!!

Much of my work in the Rossiter BatLab for the last couple of years has centred around the search for genomic signatures of molecular convergence. This means looking for similar genetic changes in otherwise unrelated organisms. We’d normally expect unrelated organisms to differ considerably in their genetic sequences, because over time random mutations occur in their genomes; the more time has passed since two species diverged, the more changes we expect. However, we know that similar structures may evolve in unrelated species due to shared selection pressures (think of the streamlined body shapes of sharks, icthyosaurs and dolphins, for example). Can these pressures produce identical changes right down at the level of genetic sequences? We hoped to detect identical genetic changes in unrelated species (in this case, the echolocation – ‘sonar hearing’ – in some species of bats and whales) caused by similar selection pressures operating on the evolution of the genes required for those traits.

It’s been a long slog – we’ve had to write a complicated computer program to look at millions of letters of DNA – but this week it all bears fruit. We found that a <em>staggering</em> number of genes in the genomes of echolocating bats and whales (a bottlenose dolphin, if you must) showed evidence of these similar genetic changes, known technically as ‘genetic convergence’.

Obviously we started jumping up and down when we found this, and because we imagined other scientists would too, we wrote up our findings and sent them to the journal <em>Nature</em>, one of the top journals in the world of science… and crossed our fingers.

Well, today we can finally reveal that we were able to get through the peer-review process (where anonymous experts scrutinise your working – a bit like an MOT for your experiments), and the paper is out today!

But what do we actually say? Well:
<blockquote>Evolution is typically thought to proceed through divergence of genes, proteins and ultimately phenotypes. However, similar traits might also evolve convergently in unrelated taxa owing to similar selection pressures. Adaptive phenotypic convergence is widespread in nature, and recent results from several genes have suggested that this phenomenon is powerful enough to also drive recurrent evolution at the sequence level. Where homoplasious substitutions do occur these have long been considered the result of neutral processes. However, recent studies have demonstrated that adaptive convergent sequence evolution can be detected in vertebrates using statistical methods that model parallel evolution, although the extent to which sequence convergence between genera occurs across genomes is unknown. Here we analyse genomic sequence data in mammals that have independently evolved echolocation and show that convergence is not a rare process restricted to several loci but is instead widespread, continuously distributed and commonly driven by natural selection acting on a small number of sites per locus. Systematic analyses of convergent sequence evolution in 805,053 amino acids within 2,326 orthologous coding gene sequences compared across 22 mammals (including four newly sequenced bat genomes) revealed signatures consistent with convergence in nearly 200 loci. Strong and significant support for convergence among bats and the bottlenose dolphin was seen in numerous genes linked to hearing or deafness, consistent with an involvement in echolocation. Unexpectedly, we also found convergence in many genes linked to vision: the convergent signal of many sensory genes was robustly correlated with the strength of natural selection. This first attempt to detect genome-wide convergent sequence evolution across divergent taxa reveals the phenomenon to be much more pervasive than previously recognized.</blockquote>
Congrats to Steve, Georgia and Joe! After a few deserved beers we’ll have our work cut out to pick through all these genes and work out exactly what all of them do (guessing the genes’ biological functions, especially in non-model (read:not us or things we eat) organisms like bats and dolphins is notoriously tricky. So we’ll probably stick our heads out of the lab again in <em>another</em> two years…

The full citation is: Parker, J., Tsagkogeorga, G., Cotton, J.A.C., Liu, R., Stupka, E., Provero, P. &amp; Rossiter, S.J. (2013) Genome-wide signatures of convergent evolution in echolocating mammals. <em>Nature</em> (epub ahead of print), 4th September 2013. doi:10.1038/nature12511. This work was funded by Biotechnology and Biological Sciences Research Council (UK) grant BB/H017178/1.

&nbsp;