Phylogenetics | Lonely Joe Parker

Jump to:

CONTEXT – Phlogenomic Dataset Browser
Real-time Phylogenomics – Tools for realtime MinION analyses
QMUL GCP – Genomic convergence pipeline

Befi-BaTS v0.1.1 alpha – Bayesian Analysis of Tip-phylogeny Significance

SHiAT v1.0 – Shannon Heterogeneity in Alignments Tool
HADPACK – HIV Antigen Datamining Package

Befi-BaTS, v0.1.1 (alpha) | Using BaTS with BEAST ouptut | Download

Befi-BaTS provides a method by which the degree to which phenotypic traits seen in a phylogeny are associated with ancestry are correlated. In other words, where a set of character states are seen on the tip of a phylogenetic tree, is any given taxon more likely to share a character state with a sister taxon than we would expect due to chance?

This problem has been posed in a variety of contexts over the last three decades, particularly molecular epidemiology and phylogeography. A number of approaches have been developed over the years, of which the method of Slatkin & Maddison (1989) is perhaps the best known.

Befi-BaTS uses two established statistics (the Association Index, AI (Wang et al., 2001), and Fitch parsimony score, PS) as well as a third statistic (maximum exclusive single-state clade size, MC) introduced by us in the BaTS citation, where the merits of each of these are discussed. Befi-BaTS 0.1.1 includes additional statistics that include branch length as well as tree topology. What sets Befi-BaTS aside from previous methods, however, is that we incorporate uncertainty arising from phylogenetic error into the analysis through a Bayesian framework. While other many other methods obtain a null distribution for significance testing through tip character randomization, they rely on a single tree upon which phylogeny-trait association is measured for any observed or expected set of tip characters.

To improve on this basic approach we use posterior sets of trees (PSTs), obtained through earlier Bayesian MCMC analysis of the data, that integrate over all likely phylogenies and incorporate the phylogenetic uncertainty arising from the data. Although a Bayesian MCMC analysis is therefore a precondition to using Befi-BaTS, we do not feel that this is likely to deter potential users since these analyses are increasingly common.

Important note when using BaTS with BEAST output:

The #Nexus format used by BEAST to write treefiles includes a number of branch and node label tags that are not necessarily parsed correctly in BaTS. This is because there is no IEEE standard for phylogenies and I chose to write BaTS to parse original #Nexus files primarily. If you are using BaTS with BEAST output (especially BEAST 1.3 and newer) I recommend you remove these branch /node tads as follows:

We are going to find/replace a grep (search) pattern in the treefiles. It’s not feasible to do this manually, as there will be thousands of ocurrences in each treefile. Instead we’ll use a text editor’s find/replace function.
I’m not going to write a tutorial on grep / regular expressions use in bioinformatics here – since this most people learn this early in graduate studies or before – but if you’re utterly unfamiliar with this concept try this tutorial or go to your library and look up Perl For Dummies or, well, any beginners’ scripting textbook.
You will need a text editor such as TextWrangler (Mac), Notepad++ (Windows) or a command-line editor like vim or emacs (Mac / *nix)
Open the treefile in your editor.
Select only the ‘trees’ block.
Perform a find / replace using the following search string:
```
(\[\&){1}([\=\.\-\,\_\%\{\}A-Za-z0-9])+\]
```
Because BaTS uses the token ‘[&R] ‘ to split the treefile lines, you need to re-insert this token. Do so with another search pattern, finding:
```
(=\ )
```
and replacing with:
```
(=\ [&R]\ )
```

Overall we have (for example):

Starting line (wrong):
tree STATE_0[&lnP=0.0] = [&R] ((31:[&rate=0.0010]0.19716131857378863,(((((6:[&rate=0.0010]0.006170....       099781397861);

Intermediate (also wrong):
tree STATE_0 = ((31:0.19716131857378863,(((((6:0.00617064613201129,32:0.00617064613201129):0....       099781397861);

Final (correct):
tree STATE_0 = [&R] ((31:0.19716131857378863,(((((6:0.00617064613201129,32:0.00617064613201129):0....       099781397861);

Save (under a new name) and close the file.

The modified file should now work with BaTS. If you are still having trouble try:

Checking the BaTS build is working at all on your system using the included example files, with reference to the manual supplied;
Checking a truncated treefile of your data (5-10 trees maximum) with only a few replicates for the null (expected) distribution. The current version of BaTS is essentially a research tool rather than an enterprise distribution, and can seem very slow compared to other phylogenetics packages.

Downloads:
Befi-BaTS on GitHub | Documentation | Executable .jarfile (requires Java 1.5+) | Application bundle (documentation, example files and .jarfile)

SHiAT

SHiAT calculates the sitewise alignment entropies of a set of input sequences in FASTA format. It also calculates overall sequence entropy, and other diversity statistics.

Downloads:
Documentation | Executable .jarfile (requires Java 1.5+) | Mac OSX application

HADPACK

HADPACK is a package of open-source tools and handler scripts designed for the computational analysis of HIV antigen sites using phylogenetic and clinical data. A public release of this software is upcoming.

Downloads:
HADPACK on GitHub

QMUL Genome Convergence Pipeline

A collection of Java libraries and executables for conducting phylogenomic analyses, mainly for convergence detection. Includes wrappers to a variety of tools including PAML, RAxML, and PhyloBayes. Developed in the Rossiter Lab at Queen Mary, University of London.

Downloads:
HADPACK on GitHub

CONTEXT – Phylogenomic Dataset Browser

What does it do?
A QC (quality control) tool for phylogenomics data, basically – it simultaneously displays large numbers (~thousands) of multiple sequence alignments and/or phylogenies, along with summary statistics.
Why does it do that?
Phylogenomic analyses rely on good quality input data. Visualising and quantitatively sorting/filtering input data is essential. There isn’t a simple standalone tool for this at the moment.
What doesn’t it do?
You can’t manually edit, align, or infer phylogenies with CONTEXT. There’s plenty of other tools to do this. See RAxML, Muscle, Bali-Phy, Se-Al or GUIs like HYPHY or Geneious, for a start.

Downloads:
HADPACK on GitHub

Real-time Phylogenomics

A sandbox and development repo for projects in real-time phylogenomics application development, especially SMRT (Oxford Nanopore) sequence analysis platforms. A public release of this software is upcoming.

Downloads:
Real-time Phylogenomics on GitHub

Tweet this Digg Post to LinkedIn Slashdot Stumble This