All posts by Joe

Camberwell Green consultation closing – reject these plans!

(Reposted from SouthwarkCyclists.org.uk…)
Southwark Council propose expensive and disruptive alterations to Camberwell Green junction and area. The plans are terrible. Southwark Cyclists have thoroughly examined their proposal with expert advice and we have decided to formally reject these plans. 
 
Please respond to the consultation TODAY, rejecting the proposal, link here: https://consultations.southwark.gov.uk/environment-leisure/camberwell-town-centre-public-realm-improvements 
 
The consultation closes tomorrow (Thurs 13th August). It only takes a minute or two to do. Please forward this email to all your friends and cycling or walking colleagues, encouraging them to reject the proposal too. Because this is a formal council consultation, not some online petition your voice counts – the council have to rethink if enough people object.
 
The council have to go back and rethink these terrible plans, which do nothing for cycling, and could do much more for walking. They are purely cosmetic alterations which do nothing to improve the terrible safety record of this junction, which includes one death already this year.
 
This is our reasoning – you can read the document in full at this link:
  • Southwark Cyclists reject these proposals.

  • Southwark Council’s proposals for Camberwell Green alterations do nothing to address serious and worsening safety issues for cyclists.

  • These largely cosmetic alterations also miss several opportunities to substantially improve crossing times and safety for pedestrians.

  • These proposals assume increasing traffic flow when TfL’s own figures show car ownership, use, and miles travelled are all decreasing in this part of London. The Council’s own figures and transport policy forecast a rise in cycle numbers.

  • We urge Southwark Council and Transport for London to reconsider their approach to this key junction and the Camberwell Green area. Space for cycling at the junction and/or a cycle bypass are feasible alternatives to their proposals.

Sensible Alternatives

A number of sensible alternatives exist which could be built in a similar amount of time and disruption, and would greatly improve the safety and convenience of the area for cyclists and pedestrians alike:

Camberwell Green traffic.
Six lanes of traffic at Camberwell Green – on Denmark Hill looking south.

The public environment. Firstly, even the council recognise that the Green itself is a retail and services destination. Thousands of local people use the Green every day for shopping, the library, pharmacy, clinic and courts. But the public environment on Camberwell Church Street and Denmark Hill (outside Butterfly Walk) is intimidating for pedestrians and cyclists alike, with trucks and coaches hurtling down six lanes of traffic at 30mph+, as this picture shows. It’s hard to cross and no wonder so many people prefer to drive short distances to these shops. The proposal does nothing to improve the environment – although the pavement itself will get some expensive new stone, nothing will be done to make it easier or more inviting to get to the Green by bike or on foot, or cross the road once you’re there. Southwark Cyclists propose the town centre, one of the historic local greens of South London, be more imaginatively redesigned as a retail and services destination where pedestrians and cyclists come first. This can be done while maintaining traffic capacity, but calming it.

A plan of the junction with space for cycling
Space for Cycling at the crossroads. Although both Southwark Council and Transport for London are committed to decreasing cycle and pedestrian deaths and injuries on the roads, and increasing numbers of cycling and walking trips in Southwark by 2020, the junction design Southwark Council propose is straight out of the 1980s – ‘stuff traffic through as quickly as possible, and hope nothing goes wrong’. In fact it’s barely different from the current layout here (one death already this year…) except that, incredibly, there are fewer cycle facilities than now. In conjunction with experts from another London borough and the London Cycling Campaign, Southwark Cyclists propose an alternative design for the green where pedestrians, cyclists and motor traffic all move separately. This layout will retains motor vehicle capacity, is safe for cyclists, and far more convenient for larger numbers of pedestrians. To read more, see our consultation response.

A map of Camberwell Green with potential bypass routes.

A bus hub – in Orpheus Street, not the high street. One of the council’s motivations for doing anything at all is the large numbers of pedestrians waiting for busses on the pavement outside the shopping centre. Often these spill out into the road; it’s unsafe and hard to get past with a pushchair or wheelchair. Instead of the council’s plans (which actually add hardly any space at all on the pavement in question, and do nothing to calm the traffic at all) we suggest several bus stops could be moved 10-20m down, into Orpheus Street, creating a local bus hub. With proper lighting and other features, this could also be a much safer place to wait, and would create additional retail or services frontages in Orpheus Street, which at the moment is a barren alley with just the art shop. Why move the bus stops at all? Well, moving them from the high street (Denmark Hill) would make the crossing simpler for pedestrians, make driving easier for cars (no busses cutting in and out suddenly), make cycling safer (by freeing space for a bike lane) and the environment far more pleasant for everyone.

Basically, there are several options, far better than the Council’s plans. Reject these plans today (Thurs 13th August closing date) and send them back to the drawing board!

Omics in extreme Environments (Lightweight bioinformatics)

Presentation on lightweight bioinformatics (Raspi / cloud computing) for real-time field-based analyses.

Presented at iEOS2015, St. Andrews, 3-6th July 2015.

Slides [SlideShare]: cc-by-nc-nd

[slideshare id=50251856&doc=joeparkerlightweightbioinformatics-150707112254-lva1-app6892]

BaTS (and Befi-BaTS), SHiAT, and Genome Convergence Pipeline have moved!

Important – please take note!
Headline:

  • All my phylogenetics software is now on GitHub, not websites or Google Code
  • Please use the new FAQ pages and issue/bug tracker forms, rather than emailing me directly in the first instance

Until now, I’ve been hosting the open-sourced parts of my phylogenetics software on code.google.com. These include the BaTS (and Befi-BaTS) tools for phylogeny-trait association correlations; the alignment profilers SHiAT (and Genious Entropy plugin), and the Genome Convergence API for the Genome Convergence Pipeline and Phylogenomics Dataset Browser. However, Google announced that they are ending support for Google Code, and from August all projects will be read-only.

I’ve therefore migrated all my projects to GithubThis will eventually include FAQs, forums and issue/bug tracking for the most popular software, BaTS and Genome Convergence API.

The projects can now be found at:

 

I am also changing how I respond to questions and bug requests. In the past I dealt with questions as they came in, with the odd explanatory post and a manual or readme with each release. Predictably, this meant I spent a lot of time dealing with duplicates or missing bugs or feature requests. I am now in the process of compiling a list of FAQs for each project, as well as uploading the manuals in markdown format so that I can update them with each release. Please bear with me as I go through this process. In the meantime, if you have an issue with a piece of software or think you have found a bug, please:

  1. Make sure you have the most recent version of the software. In most cases this will be available as an executable .jarfile on the project github page.
  2. Check the ‘Issues’ tab on the project github page. Your issue may be a duplicate, or already fixed by a new release. If your bug isn’t listed, please open a new issue giving as much detail as possible.
  3. Check the manual and FAQs to see if anyone else has had the same problem – I may well have answered their question already.
  4. If you still need an answer please email me on joe+bioinformaticshelp@kitserve.org.uk

Thanks so much for your support and involvement,

Joe

Embedding Artist profiles, playlists, and content from Spotify in HTML

Quick post this – turns out Spotify have added a really cool new function to their desktop application: You can now right-click any resource in Spotify (could be an artist, a playlist, a profile or a track or album) and get a link to the HTML code you need to embed it into another webpage. The link looks like this:

Untitled 2

The HTML is then copied to your clipboard, ready to drop into an artist webpage. Pretty cool eh? Let’s give it a spin:

1
<iframe src="https://embed.spotify.com/?uri=spotify%3Aartist%3A4qsWY8X6Yq3TTVe4gn6cnL" height="300" width="300" frameborder="0"></iframe>


Parsing numbers from multiple formats in Java

We were having a chat over coffee today and a question arose about merging data from multiple databases. At first sight this seems pretty easy, especially if you’re working with relational databases that have unique IDs (like, uh, a Latin binomial name – Homo sapiens) to hang from… right?

But, oh no.. not at all. One important reason is that seemingly similar data fields can be extremely tricky to merge. They may have been stated with differing precision (0.01, 0.0101, or 0.01010199999?), be encoded in different data types (text, float, blob, hex etc) or character set encodings (UTF-8 or Korean?) and even after all that, refer to subtly different quantities (mass vs weight perhaps). Who knew database ninjas actually earnt all that pay.

So it was surprising, but understandable, to learn that a major private big-data user (unnamed here) stores pretty much everything as text strings. Of course this solves one set of problems nicely (everyone knows how to parse/handle text, surely?) but creates another. That’s because it is trivially easy to code the same real-valued number in multiple different text strings – some of which may break sort algorithms, or even memory constraints. Consider the number ‘0.01’: as written there’s very little ambiguity for you and me. But what about:

“0.01”,
“00.01”,
” 0.01″ (note the space),
or even “0.01000000000”?

After a quick straw poll, we also realised that, although we knew how most of our most-used programming languages (Java for me, Perl, Python etc for others) performed the appropriate conversion in their native string-to-float methods. We knew how we thought they worked, and how we hoped they would, but it’s always worth checking. Time to write some quick code – here it is, on GitHub

And in code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
package uk.ac.qmul.sbcs.evolution.sandbox;

/**
* Class to test the Float.parseFloat() method performance on text data
*
In particular odd strings which should be equal, e.g.
*
<ul>
    <li>"0.01"</li>
    <li>"00.01"</li>
    <li>" 0.01" (note space)</li>
    <li>"0.0100"</li>
</ul>
*

NB uses assertions to test - run JVM with '-ea' argument. The first three tests should pass in the orthodox manner. The fourth should throw assertion errors to pass.
* @author joeparker
*
*/

public class TextToFloatParsingTest {

/**
* Default no-arg constructor
*/

public TextToFloatParsingTest(){
/* Set up the floats as strings*/
String[] floatsToConvert = {"0.01","00.01"," 0.01","0.0100"};
Float[] floatObjects = new Float[4];
float[] floatPrimitives = new float[4];

/* Convert the floats, first to Float objects and also cast to float primitives */
for(int i=0;i&lt;4;i++){
floatObjects[i] = Float.parseFloat(floatsToConvert[i]);
floatPrimitives[i] = floatObjects[i];
}

/* Are they all equal? They should be: test this. Should PASS */
/* Iterate through the triangle */
System.out.println("Testing conversions: test 1/4 (should pass)...");
for(int i=0;i&lt;4;i++){
for(int j=1;j&lt;4;j++){
assert(floatPrimitives[i] == floatPrimitives[j]);
assert(floatObjects[i] == floatPrimitives[j]);
}
}
System.out.println("Test 1/4 passed OK");

/* Test the numerical equivalent */
System.out.println("Testing conversions: test 2/4 (should pass)...");
for(int i=0;i&lt;4;i++){
assert(floatPrimitives[i] == 0.01f);
}
System.out.println("Test 2/4 passed OK");

/* Test the numerical equivalent inequality. Should PASS */
System.out.println("Testing conversions: test 3/4 (should pass)...");
for(int i=0;i&lt;4;i++){
assert(floatPrimitives[i] != 0.02f);
}
System.out.println("Test 3/4 passed OK");

/* Test the inversion */
/* These assertions should FAIL*/
System.out.println("Testing conversions: test 4/4 (should fail with java.lang.AssertionError)...");
boolean test_4_pass_flag = false;
try{
for(int i=0;i&lt;4;i++){
for(int j=1;j&lt;4;j++){
assert(floatPrimitives[i] != floatPrimitives[j]);
assert(floatObjects[i] != floatPrimitives[j]);
test_4_pass_flag = true; // If AssertionErrors are thrown as we expect they will be, this is never reached.
}
}
}finally{
// test_4_pass_flag should never be set true (line 62) if AssertionErrors have been thrown correctly.
if(test_4_pass_flag){
System.err.println("Test 3/4 passed! This constitutes a logical FAILURE");
}else{
System.out.println("Test 4/4 passed OK (expected assertion errors occured as planned.");
}
}
}
public static void main(String[] args) {
// TODO Auto-generated method stub
new TextToFloatParsingTest();
}

}


If you run this with assertions enabled (‘/usr/bin/java -ea package uk.ac.qmul.sbcs.evolution.sandbox.TextToFloatParsingTest’) you should get something like:

Testing conversions: test 1/4 (should pass)...
Test 1/4 passed OK
Testing conversions: test 2/4 (should pass)...
Test 2/4 passed OK
Testing conversions: test 3/4 (should pass)...
Test 3/4 passed OK
Testing conversions: test 4/4 (should fail with java.lang.AssertionError)...
Exception in thread "main" java.lang.AssertionError
    at uk.ac.qmul.sbcs.evolution.sandbox.TextToFloatParsingTest.<init>(TextToFloatParsingTest.java:60)
    at uk.ac.qmul.sbcs.evolution.sandbox.TextToFloatParsingTest.main(TextToFloatParsingTest.java:76)
Test 4/4 passed OK (expected assertion errors occured as planned.

Application note: ‘Befi-BaTS’ version 0.10.1 – Error rate and statistical power of distance-based measures of phylogeny-trait association.

In prep.

SUMMARY

Building on work presented previously (Parker et al., 2008), we study a number of more complex measures of phylogeny-trait association (implemented in the program Befi-BaTS / BaTS v0.10.1) which take into account the branch lengths of a phylogenetic tree in addition to the topographical relationship between taxa. Extensive simulation is performed to measure the Type II error rate (statistical power) of these statistics including those introduced in Parker et al. (2008), as well as the relationship between power and tree shape. The technique is applied to an empirical hepatitis C virus data set presented by Sobesky et al. (2007); their original conclusion that compartmentalization exists between viruses sampled from tumorous and non-tumorous cirrhotic nodules and the plasma is upheld. The association index (AI), migration (PS), phylodynamic diversity (PD) and unique fraction (UF) statistics offer the best combination of Type I error and statistical power to investigate phylogeny-trait association in RNA virus data, while the maximum monophyletic clade size (MC) and nearest taxon (NT) statistics suffer from reduced power in some regions of tree space.

Keywords: BaTS, hepatitis C virus, Markov-chain Monte Carlo, Phylogeny-trait association, Phylogenetic uncertainty, simulation.

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

  • v1: (): .doc
  • v2 (01/01/2014): .docx
  • v3 (16/06/2017): .pdf
  • View this project on GitHub

 

Application note: CONTEXT, a Phylogenomic Dataset Browser

In prep. (v3 – 14 Jun 2017)

Summary. The CONTEXT (COmparative Nucleotides and Trees Exploration Tool) is a phylogenomics dataset browser that consists of a Java API and an executable binary jarfile with graphical user interface (GUI) for the high-throughput analysis of phylogenomic datasets to detect convergent molecular evolution.

Motivation. Comparative genomics studies have become increasingly common, but these analyses are sensitive to the quality and heterogeneity of input datasets (multiple sequence analyses and phylogenies). Currently few tools exist to readily compute descriptive statistics, or to visualise large numbers of input datasets. CONTEXT facilitates these analyses in a lightweight application which allows any user to rapidly visualise, inspect, score, and sort input datasets to identify outlying datasets which may need additional processing or filtering.

Results. The application has been successfully implemented on a variety of infrastructures. A variety of common input data formats including FASTA, Phylip/PAML, Nexus, and Newick conventions are automatically read and parsed.

 

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

 

  • v3 (14/07/2017): .pdf
  • v2 (03/04/2017): .pdf
  • v1 (24/02/2015): .doc
  • View this project on GitHub

Detection of molecular convergence – literature review

In prep. (v2 – 21 April 2015)

Abstract

Convergent evolution is a process by which neutral evolutionary processes and adaptive natural selection in response to niche specialisation lead to similar forms arising in unrelated taxa. Phenotypic convergence has been appreciated for well over a century (recognised as a confounding factor in morphological cladistics). Recently several studies have demonstrated that convergent-type signals exist in some molecular datasets. Extending these studies to genome scale data presents substantial challenges and opportunities. This chapter reviews the definition of convergence (compared to parallelism), and the biological interpretation of apparently convergent molecular data. Recent methodological developments and applications are examined and future problems outlined. These include suitable null and alternative models, and the role of multiple test phylogenies in convergence detection by the congruence / phylogeny support method.

 

Manuscripts in progress (all rights reserved – you may not copy or distribute these files; content and conclusions subject to change; strictly embargoed until publication in a peer-reviewed journal/book):

 

  • v1 (10/04/2015): .doc
  • v2 (21/04/2015): .doc

In support of… taxis

Here’s where I’m coming from with this one. Let me say it simply:

More regular cyclists means lower private car ownership, less congestion for taxis to deal with, and more non-car-owners taking taxi trips.

OK, now for the detailed bit…

An organisation called the London Taxi Drivers’ Association (LTDA) – representing about a third of black cabs apparently, so a minority – has been railing against Transport for London (TfL)’s £913 million investment in cycling over this decade. They’re led by a bloke called Steve McNamara, who (when he’s not comparing cyclists to ISIS) complains that this is far too much money. It’s kicked up quite a fuss.

I’m not sure how much money he thinks should be spent to reducing the 145 deaths and 4496 serious injuries to cyclists, in London, in the last decade. He doesn’t say. And he doesn’t point out either that over the same period as that £0.9bn cycling spend TfL are allocating £34bn to other modes of transport. But rather than laying into taxi drivers, I actually want to use this post to support and (in a roundabout way) defend them.

Now I’m going a bit out of my comfort zone here. I’ve had run-ins with taxis, including one serious accident (he pulled a U-turn without signalling as I filtered outside stationary traffic, wiped me out, and drove off after giving false information, illegally). But you can attribute that to us both being in a hurry. Generally, although there isn’t an additional driving competency test to be a cabbie (the Knowledge tests wayfinding, not driving skill – which means cabbies are no less or more qualified than anyone else with a cat B license), cabbies are fairly aware of their surroundings. And they’re used to driving near bikes. This means that when finely judging risk – as I have to do every second on the road as a cyclist, something I barely have to bother with when I’m driving – I am more worried by a tourist in a private car than a taxi.

So I’m happy to share the roads with cabs. But it seems a minority of them don’t really reciprocate that – in fact they hate cyclists – and I really, really can’t see why.

The argument against bikes, from the taxi cab, is two-pronged as far as I can tell: that bikes clutter up the road, slowing traffic, and secondly that bikes take fares away from taxis. Let’s look at those:

Do bikes clutter up the road? Well in a word, no! On a typical zone 1-3 trip, even on Mrs. LJP’s clunky old sit-up-and-beg bike, I’ll overtake every vehicle along the way except motorbikes. And that’s without jumping lights, overtaking unsafely, or breaking a sweat. Congestion is just so bad that I can’t help it, something the data proves. So I don’t cause congestion, I leapfrog it. That row of stationary cars with a single person in on the A11? Those aren’t bikes, they’re, well.. cars.

Ah! Say the drivers at this point: Bikes are causing that! By taking road space! So if only we built more roads! Well… most of the vehicles on the roads are still private cars, and each one takes the space of 4-10 bikes, depending on the traffic conditions, so I think we could make our own minds up on that one. Not that we need to: TfL have said, officially, that they are simply unable to wring any more space out of London’s roads for private cars [link 2]. In the next decade-and-a-half, London will gain an extra million-and-a-half-people. That’s why they’ve been investing heavily in bikes, walking, and public transport for the last 10 years. It isn’t that they’ve suddenly become hippies – I’ve met a fair few of them and they’re all pretty small-C conservative – but because, as engineers they make decisions based on evidence, and the simple fact is there isn’t any more space to use in London, and cycling, public transport and – yes – taxis are the most efficient use of that space, not private cars.

Secondly do bikes take fares away from taxis? Well taxi fares are under pressure from minicabs and Hailo, but that’s nothing to do with cycles (although it is a lot to do with private car use in the centre of London again) so let’s just note that it would be more appropriate for the LTDA to focus their ire on that and move on. I suppose we can split
taxi fares into two types – regular short hops during the day/evening, in central London, and longer trips that happen occasionally. Taxis prefer the first type as it’s a much better income stream – more predictable and less hassle (fair enough).

Well if city workers are choosing to use bikes over taxis, presumably because it’s cheaper, quicker, and more convenient, then that’s a pretty damning indictment of taxis’ levels of service. And does the LTDA think it’ll win these customers back by ranting at them? Doubt it.

This leads onto the second point about bikes and taxi fares – cyclists are also taxi customers – and big ones. Car ownership has been declining in every London borough, for two decades – but driving licence registrations have held steady. So how do all these non-car-owners get about? Well, our daughter is nearly a year old, and we walk, we cycle (without her), we use public transport, we hire cars for longer trips and, for shopping trips etc, – guess what – we take taxis! Now at the moment our daughter is small enough that’s not a problem, it works well. But in a year from now, we’ll need to cycle with her for some trips – if its safe enough to do so. If it isn’t we can’t afford lots of taxi trips to take her to the nursery etc – and we’ll be forced to buy a car. And then we’ll probably never take a taxi again; why would we when we have a car?

Put really simply, more regular cyclists means lower private car ownership, less congestion for taxis to deal with, and more people taking taxi trips.

So I’ll ask it again: why is the LTDA so against cyclists?

Not minicabs?
Not Hailo?
Not unlicensed mopeds?
Not private cars?

These are all far bigger inconveniences to their working lives, and far bigger threats to their livelihood – but the LTDA have picked on cyclists – why? The simplest explanation to me is they just don’t like bikes. It’s visceral, it’s illogical, and it’s short-sightedly picking on the one group of other road users who ought to be natural allies. If I was a cabbie, especially an LTDA member, I’d be spitting teeth at LTDA’s failure to spot a natural ally, and work with them. But hey, it takes all sorts… right?