Phylogenetics

Jump to:


Befi-BaTS, v0.1.1 (alpha) | Using BaTS with BEAST ouptut | Download

Befi-BaTS provides a method by which the degree to which phenotypic traits seen in a phylogeny are associated with ancestry are correlated. In other words, where a set of character states are seen on the tip of a phylogenetic tree, is any given taxon more likely to share a character state with a sister taxon than we would expect due to chance?

This problem has been posed in a variety of contexts over the last three decades, particularly molecular epidemiology and phylogeography. A number of approaches have been developed over the years, of which the method of Slatkin & Maddison (1989) is perhaps the best known.

Befi-BaTS uses two established statistics (the Association Index, AI (Wang et al., 2001), and Fitch parsimony score, PS) as well as a third statistic (maximum exclusive single-state clade size, MC) introduced by us in the BaTS citation, where the merits of each of these are discussed. Befi-BaTS 0.1.1 includes additional statistics that include branch length as well as tree topology. What sets Befi-BaTS aside from previous methods, however, is that we incorporate uncertainty arising from phylogenetic error into the analysis through a Bayesian framework. While other many other methods obtain a null distribution for significance testing through tip character randomization, they rely on a single tree upon which phylogeny-trait association is measured for any observed or expected set of tip characters.

To improve on this basic approach we use posterior sets of trees (PSTs), obtained through earlier Bayesian MCMC analysis of the data, that integrate over all likely phylogenies and incorporate the phylogenetic uncertainty arising from the data. Although a Bayesian MCMC analysis is therefore a precondition to using Befi-BaTS, we do not feel that this is likely to deter potential users since these analyses are increasingly common.

Important note when using BaTS with BEAST output:

The #Nexus format used by BEAST to write treefiles includes a number of branch and node label tags that are not necessarily parsed correctly in BaTS. This is because there is no IEEE standard for phylogenies and I chose to write BaTS to parse original #Nexus files primarily. If you are using BaTS with BEAST output (especially BEAST 1.3 and newer) I recommend you remove these branch /node tads as follows:

  1. We are going to find/replace a grep (search) pattern in the treefiles. It’s not feasible to do this manually, as there will be thousands of ocurrences in each treefile. Instead we’ll use a text editor’s find/replace function.
    I’m not going to write a tutorial on grep / regular expressions use in bioinformatics here – since this most people learn this early in graduate studies or before – but if you’re utterly unfamiliar with this concept try this tutorial or go to your library and look up Perl For Dummies or, well, any beginners’ scripting textbook.
  2. You will need a text editor such as TextWrangler (Mac), Notepad++ (Windows) or a command-line editor like vim or emacs (Mac / *nix)
  3. Open the treefile in your editor.
  4. Select only the ‘trees’ block.
  5. Perform a find / replace using the following search string:
    (\[\&){1}([\=\.\-\,\_\%\{\}A-Za-z0-9])+\]
  6. Because BaTS uses the token ‘[&R] ‘ to split the treefile lines, you need to re-insert this token. Do so with another search pattern, finding:
    (=\ )

    and replacing with:

    (=\ [&R]\ )
  7. Overall we have (for example):
    Starting line (wrong):
    tree STATE_0[&lnP=0.0] = [&R] ((31:[&rate=0.0010]0.19716131857378863,(((((6:[&rate=0.0010]0.006170....       099781397861);
    
    Intermediate (also wrong):
    tree STATE_0 = ((31:0.19716131857378863,(((((6:0.00617064613201129,32:0.00617064613201129):0....       099781397861);
    
    Final (correct):
    tree STATE_0 = [&R] ((31:0.19716131857378863,(((((6:0.00617064613201129,32:0.00617064613201129):0....       099781397861);
  8. Save (under a new name) and close the file.

The modified file should now work with BaTS. If you are still having trouble try:

  • Checking the BaTS build is working at all on your system using the included example files, with reference to the manual supplied;
  • Checking a truncated treefile of your data (5-10 trees maximum) with only a few replicates for the null (expected) distribution. The current version of BaTS is essentially a research tool rather than an enterprise distribution, and can seem very slow compared to other phylogenetics packages.

Downloads:
Befi-BaTS on GitHub | Documentation | Executable .jarfile (requires Java 1.5+) | Application bundle (documentation, example files and .jarfile)


SHiAT

SHiAT calculates the sitewise alignment entropies of a set of input sequences in FASTA format. It also calculates overall sequence entropy, and other diversity statistics.

Downloads:
Documentation | Executable .jarfile (requires Java 1.5+) | Mac OSX application


HADPACK

HADPACK is a package of open-source tools and handler scripts designed for the computational analysis of HIV antigen sites using phylogenetic and clinical data. A public release of this software is upcoming.

Downloads:
HADPACK on GitHub


QMUL Genome Convergence Pipeline

A collection of Java libraries and executables for conducting phylogenomic analyses, mainly for convergence detection. Includes wrappers to a variety of tools including PAML, RAxML, and PhyloBayes. Developed in the Rossiter Lab at Queen Mary, University of London.

Downloads:
HADPACK on GitHub


CONTEXT – Phylogenomic Dataset Browser

What does it do?
A QC (quality control) tool for phylogenomics data, basically – it simultaneously displays large numbers (~thousands) of multiple sequence alignments and/or phylogenies, along with summary statistics.
Why does it do that?
Phylogenomic analyses rely on good quality input data. Visualising and quantitatively sorting/filtering input data is essential. There isn’t a simple standalone tool for this at the moment.
What doesn’t it do?
You can’t manually edit, align, or infer phylogenies with CONTEXT. There’s plenty of other tools to do this. See RAxML, Muscle, Bali-Phy, Se-Al or GUIs like HYPHY or Geneious, for a start.

Downloads:
HADPACK on GitHub


Real-time Phylogenomics

A sandbox and development repo for projects in real-time phylogenomics application development, especially SMRT (Oxford Nanopore) sequence analysis platforms. A public release of this software is upcoming.

Downloads:
Real-time Phylogenomics on GitHub


  • Jeremy Hayward

    Here’s a workflow for GNU/Linux which achieves the text wrangling needed to get BEAST output into BaTS. It’s really ugly and kludgy, but it works so I don’t want to prettify it (and probably break it).

    Cut all the header information from your BEAST output first. This is everything from the start of the file to the first tree STATE_xxx line. You’re going to replace this with a BaTS states block anyway, so just get rid of it.
    Next, cut all the metacomments, then put back the [&R] delimiter:
    perl -pi.bak -e ‘s/[nrR]+//gms’ $Path_To_Your_file
    sed -i ‘s/[[^]]*]//g’  $Path_To_Your_file
    perl -pi.bak -e ‘s/;/;n/g’ $Path_To_Your_file
    perl -pi.bak -e ‘s/= /= [&R]/gms’ $Path_To_Your_file

    The first line removes linebreaks, since sed can’t handle these, and perl handles them a little weirdly. The second like nukes the metacomments. The third line reinserts linebreaks after semicolons, where they belong. The last line sticks the [&R] delimiter back into the right spots.

    Next, prepend the #NEXUS file identifier and your STATES block to the file. You should be good to go.

    Be careful with CAPITALS:   STATES must be capitalised to mark the states block.

    –Jeremy

  • Jeremy Hayward

    Hi again–
      could you post the source codesomewhere–that is, the original .java files, not just the .classes? Also–Just because I’m a card-carrying Free Software  fanatic…I think Befi-Bats is LGPL, right? The documentation says LGPL V1.3…Do you mean V3? the first lgpl was 2 and 3 has been current since 2007, I think. 
    Cheers!

Blogging on science, singing, cycling and a bit of scribbling