Tag Archives: coding

BaTS (and Befi-BaTS), SHiAT, and Genome Convergence Pipeline have moved!

Important – please take note!
Headline:

  • All my phylogenetics software is now on GitHub, not websites or Google Code
  • Please use the new FAQ pages and issue/bug tracker forms, rather than emailing me directly in the first instance

Until now, I’ve been hosting the open-sourced parts of my phylogenetics software on code.google.com. These include the BaTS (and Befi-BaTS) tools for phylogeny-trait association correlations; the alignment profilers SHiAT (and Genious Entropy plugin), and the Genome Convergence API for the Genome Convergence Pipeline and Phylogenomics Dataset Browser. However, Google announced that they are ending support for Google Code, and from August all projects will be read-only.

I’ve therefore migrated all my projects to GithubThis will eventually include FAQs, forums and issue/bug tracking for the most popular software, BaTS and Genome Convergence API.

The projects can now be found at:

 

I am also changing how I respond to questions and bug requests. In the past I dealt with questions as they came in, with the odd explanatory post and a manual or readme with each release. Predictably, this meant I spent a lot of time dealing with duplicates or missing bugs or feature requests. I am now in the process of compiling a list of FAQs for each project, as well as uploading the manuals in markdown format so that I can update them with each release. Please bear with me as I go through this process. In the meantime, if you have an issue with a piece of software or think you have found a bug, please:

  1. Make sure you have the most recent version of the software. In most cases this will be available as an executable .jarfile on the project github page.
  2. Check the ‘Issues’ tab on the project github page. Your issue may be a duplicate, or already fixed by a new release. If your bug isn’t listed, please open a new issue giving as much detail as possible.
  3. Check the manual and FAQs to see if anyone else has had the same problem – I may well have answered their question already.
  4. If you still need an answer please email me on joe+bioinformaticshelp@kitserve.org.uk

Thanks so much for your support and involvement,

Joe

Embedding Artist profiles, playlists, and content from Spotify in HTML

Quick post this – turns out Spotify have added a really cool new function to their desktop application: You can now right-click any resource in Spotify (could be an artist, a playlist, a profile or a track or album) and get a link to the HTML code you need to embed it into another webpage. The link looks like this:

Untitled 2

The HTML is then copied to your clipboard, ready to drop into an artist webpage. Pretty cool eh? Let’s give it a spin:

1
<iframe src="https://embed.spotify.com/?uri=spotify%3Aartist%3A4qsWY8X6Yq3TTVe4gn6cnL" height="300" width="300" frameborder="0"></iframe>


Parsing numbers from multiple formats in Java

We were having a chat over coffee today and a question arose about merging data from multiple databases. At first sight this seems pretty easy, especially if you’re working with relational databases that have unique IDs (like, uh, a Latin binomial name – Homo sapiens) to hang from… right?

But, oh no.. not at all. One important reason is that seemingly similar data fields can be extremely tricky to merge. They may have been stated with differing precision (0.01, 0.0101, or 0.01010199999?), be encoded in different data types (text, float, blob, hex etc) or character set encodings (UTF-8 or Korean?) and even after all that, refer to subtly different quantities (mass vs weight perhaps). Who knew database ninjas actually earnt all that pay.

So it was surprising, but understandable, to learn that a major private big-data user (unnamed here) stores pretty much everything as text strings. Of course this solves one set of problems nicely (everyone knows how to parse/handle text, surely?) but creates another. That’s because it is trivially easy to code the same real-valued number in multiple different text strings – some of which may break sort algorithms, or even memory constraints. Consider the number ‘0.01’: as written there’s very little ambiguity for you and me. But what about:

“0.01”,
“00.01”,
” 0.01″ (note the space),
or even “0.01000000000”?

After a quick straw poll, we also realised that, although we knew how most of our most-used programming languages (Java for me, Perl, Python etc for others) performed the appropriate conversion in their native string-to-float methods. We knew how we thought they worked, and how we hoped they would, but it’s always worth checking. Time to write some quick code – here it is, on GitHub

And in code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
package uk.ac.qmul.sbcs.evolution.sandbox;

/**
* Class to test the Float.parseFloat() method performance on text data
*
In particular odd strings which should be equal, e.g.
*
<ul>
    <li>"0.01"</li>
    <li>"00.01"</li>
    <li>" 0.01" (note space)</li>
    <li>"0.0100"</li>
</ul>
*

NB uses assertions to test - run JVM with '-ea' argument. The first three tests should pass in the orthodox manner. The fourth should throw assertion errors to pass.
* @author joeparker
*
*/

public class TextToFloatParsingTest {

/**
* Default no-arg constructor
*/

public TextToFloatParsingTest(){
/* Set up the floats as strings*/
String[] floatsToConvert = {"0.01","00.01"," 0.01","0.0100"};
Float[] floatObjects = new Float[4];
float[] floatPrimitives = new float[4];

/* Convert the floats, first to Float objects and also cast to float primitives */
for(int i=0;i&lt;4;i++){
floatObjects[i] = Float.parseFloat(floatsToConvert[i]);
floatPrimitives[i] = floatObjects[i];
}

/* Are they all equal? They should be: test this. Should PASS */
/* Iterate through the triangle */
System.out.println("Testing conversions: test 1/4 (should pass)...");
for(int i=0;i&lt;4;i++){
for(int j=1;j&lt;4;j++){
assert(floatPrimitives[i] == floatPrimitives[j]);
assert(floatObjects[i] == floatPrimitives[j]);
}
}
System.out.println("Test 1/4 passed OK");

/* Test the numerical equivalent */
System.out.println("Testing conversions: test 2/4 (should pass)...");
for(int i=0;i&lt;4;i++){
assert(floatPrimitives[i] == 0.01f);
}
System.out.println("Test 2/4 passed OK");

/* Test the numerical equivalent inequality. Should PASS */
System.out.println("Testing conversions: test 3/4 (should pass)...");
for(int i=0;i&lt;4;i++){
assert(floatPrimitives[i] != 0.02f);
}
System.out.println("Test 3/4 passed OK");

/* Test the inversion */
/* These assertions should FAIL*/
System.out.println("Testing conversions: test 4/4 (should fail with java.lang.AssertionError)...");
boolean test_4_pass_flag = false;
try{
for(int i=0;i&lt;4;i++){
for(int j=1;j&lt;4;j++){
assert(floatPrimitives[i] != floatPrimitives[j]);
assert(floatObjects[i] != floatPrimitives[j]);
test_4_pass_flag = true; // If AssertionErrors are thrown as we expect they will be, this is never reached.
}
}
}finally{
// test_4_pass_flag should never be set true (line 62) if AssertionErrors have been thrown correctly.
if(test_4_pass_flag){
System.err.println("Test 3/4 passed! This constitutes a logical FAILURE");
}else{
System.out.println("Test 4/4 passed OK (expected assertion errors occured as planned.");
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
new TextToFloatParsingTest();
}

}


If you run this with assertions enabled (‘/usr/bin/java -ea package uk.ac.qmul.sbcs.evolution.sandbox.TextToFloatParsingTest’) you should get something like:

Testing conversions: test 1/4 (should pass)...
Test 1/4 passed OK
Testing conversions: test 2/4 (should pass)...
Test 2/4 passed OK
Testing conversions: test 3/4 (should pass)...
Test 3/4 passed OK
Testing conversions: test 4/4 (should fail with java.lang.AssertionError)...
Exception in thread "main" java.lang.AssertionError
    at uk.ac.qmul.sbcs.evolution.sandbox.TextToFloatParsingTest.<init>(TextToFloatParsingTest.java:60)
    at uk.ac.qmul.sbcs.evolution.sandbox.TextToFloatParsingTest.main(TextToFloatParsingTest.java:76)
Test 4/4 passed OK (expected assertion errors occured as planned.

Migrating to OS X Mavericks

The time has come, my friends. I am upgrading from 10.6.8 (‘Snow Leopard’) to 10.9 (‘Mavericks’) on my venerable and mistreated MacBook Pros (one is 2010 with a SATA drive, the other 2011 with an SSD). Common opinion holds that the 2010 machine might find it a stretch so I’m starting with the 2010/SSD model first. Also, hey, it’s a work machine, so if I truly bork it, Apple Care should (should) cover me…

Availability

At least Apple make the upgrade easy enough to get: for the last year or so, Software Update has been practically begging me to install the App Store. Apple offer OSX 10.9 for free through this platform (yes! FREE!!) so it’s a couple of clicks to download and start the installer…

Preamble

Obviously I’ve backed up everything several times: to Time Machine, on an external HDD; to Dropbox; Drobo; and even the odd USB stick lying around as well as my 2010 MBP and various other machines I have access to. As well as all this, I’ve actually tried to empty the boot disk a bit to make space – unusually RTFM for me – and managed to get the usage down to about 65% available space. I’ve also written down every password and username I have, obviously on bombay mix-flavoured rice-paper so I can eat them after when everything (hopefully) works.

Installation

Click the installer. Agree to a few T&Cs (okay, several, but this is Apple we’re talking about). Hit ‘Restart’. Pray…

Results

… And we’re done! That was surprisingly painless. The whole process took less than two hours on my office connection, from download to first login. There was a momentary heart attack when the first reboot appeared to have failed and I had to nudge it along, but so far (couple of days) everything seems to be running along nicely.

Now, I had worried (not unreasonably, given previous updates) that my computer might slow down massively, or blow up altogether. So far this doesn’t seem to have happened. The biggest downsides are the ones I’d previously read about and unexpected: e.g. PowerPC applications like TreeEdit and Se-Al aren’t supported any more. Apparently the main workaround for this is a 10.6.8 Server install inside Parallels, but I’ll look into this more in a future post when I get a chance.

was a bit surprised to find that both Homebrew and, even more oddly, my SQL installation needed to be reinstalled, but a host of other binaries didn’t. Presumably there’s a reason for this but I can’t find it. Luckily those two at least install pretty painlessly, but it did make me grateful nothing else broke (yet).

So what are the good sides? The general UI is shiny, not that this matters much in a bioinformatics context, and smart widgets like Notifications are pretty, but to be honest, there aren’t any really compelling reasons to switch. I’ve not used this machine as a laptop much so far, so I can’t comment on the power usage (e.g. stuff like App Nap) yet, although it seems to be improved… a bit.. and I haven’t had time to run any BEAST benchmarks to see how the JVM implementation compares. But there is one massive benefit: this is an OS Apple are still supporting! This matters because stuff like security and firmware updates really do matter, a lot – and release cycles are getting ever shorter, especially as Macs get targeted more. In short: I couldn’t afford to stay behind any longer!

Update [5 Oct 2014]: Given the Shellshock bash exploit affects both 10.6 and 10.9, but Apple aren’t – as yet – releasing a patch for 10.6, while they rushed a 1.0 patch for 10.9 in less than a week, the security aspect of this upgrade is even more clearly important…

Update [23 Oct 2014]: Nope, I won’t be upgrading to Yosemite for a while, either!

Befi-BaTS v0.1.1 alpha release

Long-overdue update for beta version of Befi-BaTS.

Software: Befi-BaTS

Author: Joe Parker

Version: 0.1.1 beta (download here)

Release notes: Befi-BaTS v0.1 beta drops support for hard polytomies (tree nodes with > 2 daughters), now throwing a HardPolytomyException to the error stack when these are parsed. This is because of potential bugs when dealing with topology + distance measures (NTI/NRI) of polytomies. These bugs will be fixed in a future release. The current version 0.1.1 improves #NEXUS input file parsing.

Befi-BaTS: Befi-BaTS uses two established statistics (the Association Index, AI (Wang et al., 2001), and Fitch parsimony score, PS) as well as a third statistic (maximum exclusive single-state clade size, MC) introduced by us in the BaTS citation, where the merits of each of these are discussed. Befi-BaTS 0.1.1 includes additional statistics that include branch length as well as tree topology. What sets Befi-BaTS aside from previous methods, however, is that we incorporate uncertainty arising from phylogenetic error into the analysis through a Bayesian framework. While other many other methods obtain a null distribution for significance testing through tip character randomization, they rely on a single tree upon which phylogeny-trait association is measured for any observed or expected set of tip characters.

About this new site

As you can see, I’m running a new site now. Some content will be loaded pretty soon (old blogs, lyrics, music, software etc) but the design I want to do is pretty complicated, especially as the client-side stuff, so I want to wireframe it first somewhere else offline. Might get it done in a couple of months, so look out for a bit redesign after Christmas!

This is partly an ego trip, but mainly because I’m pissed off with Facebook / Myspace and all their popularity contests and formulaic box-filling shit, and want to get back to the good old Netscape days of the 1990s, when a personal website really was just that, and you could put whatever you liked up, however you wanted to.

I’ve installed plugins for Twitter and Soundcloud, and will get flickr in there too soon. Also, for commenting, I’ve activated the really-pretty-cool DisQus engine, so that those of you on social networking sites that want to comment, can.

In the meantime I’m going to slowly let the Facebook and Myspace accounts die. Just as a little piece of me dies every time I login to those damn things.