Tag Archives: BaTS

BaTS (and Befi-BaTS), SHiAT, and Genome Convergence Pipeline have moved!

Important – please take note!
Headline:

  • All my phylogenetics software is now on GitHub, not websites or Google Code
  • Please use the new FAQ pages and issue/bug tracker forms, rather than emailing me directly in the first instance

Until now, I’ve been hosting the open-sourced parts of my phylogenetics software on code.google.com. These include the BaTS (and Befi-BaTS) tools for phylogeny-trait association correlations; the alignment profilers SHiAT (and Genious Entropy plugin), and the Genome Convergence API for the Genome Convergence Pipeline and Phylogenomics Dataset Browser. However, Google announced that they are ending support for Google Code, and from August all projects will be read-only.

I’ve therefore migrated all my projects to GithubThis will eventually include FAQs, forums and issue/bug tracking for the most popular software, BaTS and Genome Convergence API.

The projects can now be found at:

 

I am also changing how I respond to questions and bug requests. In the past I dealt with questions as they came in, with the odd explanatory post and a manual or readme with each release. Predictably, this meant I spent a lot of time dealing with duplicates or missing bugs or feature requests. I am now in the process of compiling a list of FAQs for each project, as well as uploading the manuals in markdown format so that I can update them with each release. Please bear with me as I go through this process. In the meantime, if you have an issue with a piece of software or think you have found a bug, please:

  1. Make sure you have the most recent version of the software. In most cases this will be available as an executable .jarfile on the project github page.
  2. Check the ‘Issues’ tab on the project github page. Your issue may be a duplicate, or already fixed by a new release. If your bug isn’t listed, please open a new issue giving as much detail as possible.
  3. Check the manual and FAQs to see if anyone else has had the same problem – I may well have answered their question already.
  4. If you still need an answer please email me on joe+bioinformaticshelp@kitserve.org.uk

Thanks so much for your support and involvement,

Joe

Molecular epidemiology and phylogeny reveals complex spatial dynamics of endemic canine parvovirus.

J Virol. 2011 May 18. [Epub ahead of print]

Clegg SR, Coyne KP, Parker J, Dawson S, Godsall SA, Pinchbeck G, Cripps PJ, Gaskell RM, Radford AD.

Canine parvovirus 2 (CPV-2) is a severe enteric pathogen of dogs, causing high mortality in unvaccinated dogs. After emerging, CPV-2 spread rapidly worldwide. However, there is now some evidence to suggest that international transmission appears to be more restricted. In order to investigate the transmission and evolution of CPV-2 both nationally and in relation to the global situation, we have used a long range PCR to amplify and sequence the full VP2 gene of 150 canine parvoviruses obtained from a large cross-sectional sample of dogs presenting with severe diarrhoea to veterinarians in the UK, over a two year period. Amongst these 150 strains, 50 different DNA sequence types were identified, and apart from one case, all appeared unique to the UK. Phylogenetic analysis provided clear evidence for spatial clustering at the international level, and for the first time also at the national level, with the geographical range of some sequence types appearing to be highly restricted within the UK. Evolution of the VP2 gene in this dataset was associated with a lack of positive selection. In addition, the majority of predicted amino acid sequences were identical to those found elsewhere in the world, suggesting CPV VP2 has evolved a highly fit conformation. Based on typing systems using key amino acid mutations, 43% of viruses were CPV 2a, 57% CPV 2b, with no type 2 or 2c found. However phylogenetic analysis suggested complex antigenic evolution of this virus, with both type 2a and 2b viruses appearing polyphyletic. As such, typing based on specific amino acid mutations may not reflect the true epidemiology of this virus. The geographical restriction we observed both within the UK, and between the UK and other countries, together with the lack of CPV-2c in this population, strongly suggest the spread of CPV within its population may be heterogeneously subject to limiting factors. This cross-sectional study of national and global CPV phylogeographic segregation reveals a substantially more complex epidemic structure than previously described.

Befi-BaTS v0.1.1 alpha release

Long-overdue update for beta version of Befi-BaTS.

Software: Befi-BaTS

Author: Joe Parker

Version: 0.1.1 beta (download here)

Release notes: Befi-BaTS v0.1 beta drops support for hard polytomies (tree nodes with > 2 daughters), now throwing a HardPolytomyException to the error stack when these are parsed. This is because of potential bugs when dealing with topology + distance measures (NTI/NRI) of polytomies. These bugs will be fixed in a future release. The current version 0.1.1 improves #NEXUS input file parsing.

Befi-BaTS: Befi-BaTS uses two established statistics (the Association Index, AI (Wang et al., 2001), and Fitch parsimony score, PS) as well as a third statistic (maximum exclusive single-state clade size, MC) introduced by us in the BaTS citation, where the merits of each of these are discussed. Befi-BaTS 0.1.1 includes additional statistics that include branch length as well as tree topology. What sets Befi-BaTS aside from previous methods, however, is that we incorporate uncertainty arising from phylogenetic error into the analysis through a Bayesian framework. While other many other methods obtain a null distribution for significance testing through tip character randomization, they rely on a single tree upon which phylogeny-trait association is measured for any observed or expected set of tip characters.

The within- and among-host evolution of chronically-infecting human RNA viruses

A research thesis submitted for the degree of Doctor of Philosophy at the University of Oxford.

J Parker

Funded by: Natural Environment Research Council (UK) with support from Linacre College, Oxford.

Abstract: This thesis examines the evolutionary biology of the RNA viruses, a diverse group of pathogens that cause significant diseases. The focus of this work is the relationship between the processes driving the evolution of virus populations within individual hosts and at the epidemic level.

First, Chapter One reviews the basic biology of RNA viruses, the current state of knowledge in relevant topics of evolutionary virology, and the principles that underlie the most commonly used methods in this thesis.

In Chapter Two, I develop and test a novel framework to estimate the significance of phylogeny-trait association in viral phylogenies. The method incorporates phylogenetic uncertainty through the use of posterior sets of trees (PST) produced in Bayesian MCMC analyses.

In Chapter Three, I conduct a comprehensive analysis of the substitution rate of hepatitis C virus (HCV) in within- and between-host data sets using a relaxed molecular clock. I find that within-host substitution rates are more rapid than previously appreciated, that heterotachy is rife in within-host data sets, and that selection is likely to be a primary driver.

In Chapter Four I apply the techniques developed in Chapter Two to successfully detect compartmentalization between peripheral blood and cervical tissues in a large data set of human immunodeficiency virus (HIV) patients. I propose that compartmentalization in the cervix is maintained by selection.

I extend the framework developed in Chapter Two in Chapter Five and explore the Type II error of the statistics used.

In Chapter Six I review the findings of this thesis and conclude with a general discussion of the relationship between within- and among-host evolution in viruses, and some of the limitations of current techniques.

Correlating Viral Phenotypes With Phylogeny: Accounting for Phylogenetic Uncertainty

Infect Genet Evol. 2008 May;8(3):239-46. Epub 2007 Aug 21.
Parker J, Rambaut A, Pybus OG.

Many recent studies have sought to quantify the degree to which viral phenotypic characters (such as epidemiological risk group, geographic location, cell tropism, drug resistance state, etc.) are correlated with shared ancestry, as represented by a viral phylogenetic tree. Here, we present a new Bayesian Markov-Chain Monte Carlo approach to the investigation of such phylogeny-trait correlations. This method accounts for uncertainty arising from phylogenetic error and provides a statistical significance test of the null hypothesis that traits are associated randomly with phylogeny tips. We perform extensive simulations to explore and compare the behaviour of three statistics of phylogeny-trait correlation. Finally, we re-analyse two existing published data sets as case studies. Our framework aims to provide an improvement over existing methods for this problem.