‘Stretched resources’ applies to parliamentarians, too…

I listened to a sad story on BBCR4 today this morning – a grieving mother can’t bury her daughter, murdered, because there’s no body – in years of searching none has been found. The killer isn’t co-operating (they don’t have to, although it would improve their parole terms to do so). She wants a change in the law so murderers can’t get parole until a body is produced (habeas corpus, literally).

This is a worthy campaign, and it must blight her life. Thing is, this scenario affects ’70 whole families’, by her own numbers. Just 70 in the whole of the UK. A change in the *law* for this? A law which has to go through parliamentary scrutiny (twice), occupying time and resources.

Couldn’t sentencing just be updated, instead?

We have a vast number of small, tiny, individually important laws but are they collectively eating away at the vitality of our democracy? MPs need to be wrestling with and thoroughly, openly, debating the massive challenges of our time – automation, climate change, ageing, food security, migration. Most complain of long hours. Not every minor cause is lucky to have an effective MP to champion it, either – which ‘good ideas’ make it into law is arbitrary, in this sense. And finally: should we have hundreds of such minor bills on the book?

Or a simpler legal code, with more judges, able to devote more time to judicious sentencing, and a fast effective appeals process for victims and the convicted if they feel sentences and parole are unjust?

Posted in Activism, Blog | Tagged , , , | Leave a comment

Using field-based DNA sequencing to accelerate phylogenomics

Invited seminar at the Department of Zoology, Oxford University, 30th November 2016.

Summary of our field-based real-time phylogenomics (MinION DNA sequencing) experiments this year, and applicability to broad-scale tree-of-life phylogenomics and macroevolutionary biology.

Slides [SlideShare]: cc-by-nd

Posted in Publications, Talks | Tagged , , , , | Leave a comment

Science and (small) business

Over the last 10-20 years there’s been a revolution in academic science (or should that be ‘coup’?) where many aspects of the job have been professionalised and formalised, especially project management but management in general. This generally includes tools like GANTTs, milestones, workload models, targets and many other things previously unmentionable in academia but common in industry, especially large organisations. Lots of academics will tell you they think it’s bureaucratic overkill, intrusive, a waste of time, and worse (to put it mildly) but the awkward truth is that, as lab groups steadily increased in size (as fewer, larger grants went to increasingly senior PIs or consortia) many of the limitations of the collegiate style of the past, centred on a single academic with a tight-knit group, have been exposed.

Frequently the introduction of ‘management practices’, often after hiring expensive consultants, is accompanied by compulsory management training. Sometimes it can be an improvement. More normally (in my experience) whether an improvement in outcomes (as distinct from ‘efficiency’) has been achieved probably depends on whether you cost in staff time (or overtime) and morale. You can make arguments either way.

But I can’t help thinking: why are we attempting to replicate practices from big/massive private sector organisations, anyway? I suspect, the answer in part is because those are the clients management consultants have the most experience working with. More seriously, those organisations differ in fundamental respects from even the largest universities, let alone individual research projects. This is because large companies:

  • Add value to inputs to create physical goods or services that are easily costed by the market mechanism (this is the big one)
  • Usually have large cash reserves, or easy access to finance (tellingly when this ends they usually get liquidated)
  • Keep an eye on longer-term outcomes, but primarily focus on the 5-10 year horizon
  • Compete directly with others for customers (in some respects an infinite resource)
  • Are answerable, at least, yearly, to shareholders – with share value and dividends being the primary drivers of shareholder satisfaction.

Meanwhile, universities (and to an even greater extreme, research groups/PIs):

  • Produce knowledge outputs with zero market value*
  • Live hand-to-mouth on short grants
  • Need long-term, strategic thinking to succeed (really, this is why we get paid at all)
  • Compete indirectly for finite resources grants and publications, based partly on track record and partly on potential.
  • Answer, ultimately, to posterity, their families, and their Head Of Department

I want to be clear here – I’m not saying, by any means, that previous management techniques (ie, ‘none’) work well in today’s science environment – but I do think we should probably look to other models than, say General Motors, or GlaxoSmithKline. The problem is often compounded because PIs have no business experience (certainly not in startups) while consultants often come from big business – their ability to meet in the middle is limited.

Instead small and medium enterprises (SME)s are a much closer model to how science works. Here good management of resources and people is extremely important, but the scale is much smaller, permitting different management methods, often focussing on flexibility and results, not hierarchies and systems. For instance, project goals are often still designed to be SMART (specific, measurable, achievable, realistic, time-scaled) but these will be revisited often and informally, and adjusted whenever necessary. Failure is a recognised part of the ongoing process. This is the exact opposite to how a GANTT, say, is used in academia: often drawn up at the project proposal (design) stage, it is then ignored until the end of the grant, when the PI scrabbles to fudge the outcomes, goals, or both to make the work actually carried out fit, so they don’t get a black mark from the funder/HoD for missing targets.

There are plenty of other models, and they vary not just by organisation size/type (e.g. tech startup, games studio, SkunkWorks, logistics distributor, niche construction subcontractor) but you see what I mean: copying ‘big business’ wholesale without looking at the whole ecosystem of business practices makes little sense…

*Obviously not all, or even most, scientific output will never realise any economic value – but it can be years, or centuries removed from the work to create it. And spin-outs are relevant to a tiny proportion of PI’s work, even in applied fields.

Posted in Activism, Blog, Science | Tagged , , , | Leave a comment

Making progress down the road

Too many laws and customs of driving make speed more important than safety, from the driving instructors’ “make good progress down the road” (e.g. “hurry the fuck up”, which most drivers internalise as “drive at least as fast as the speed limit unless there’s literally another car right in front of you”), to every transport investment ever being marketed to (presumably furious) taxpayers as “reducing journey times”.

This is in contrast to other European countries, where safety is #1, and speed just a nice-to-have. Surely it’s time for the national Government to admit – as London’s TfL have – that the UK is blessed with only a fixed amount of road space, so with growing numbers of people using it, we all have to accept that journeys will get slower in future, not quicker.

We have a real blind spot (pun intended) in the UK about traffic jams. On the one hand, we are only too aware of all the time we **WASTE** sat in stationary traffic each day – most car journeys are fewer than five miles, made by commuters, and involve up to half that time in queues – so traffic jams are a fact of driving life here in the UK.

On the other hand, peoples’ frustration / anger / surprise about being stuck in a traffic jam on any given morning (when they are, every morning) is total. But this is bizarre… We know the traffic will be there, but still get in our car expecting a free road, at 08:30 on a weekday! Where’s all that traffic come from!

Surely it’s time to admit traffic jams exist, will get worse, not better, and constantly lurching from 0 to 30mph and back again is pointless as well as dangerous?

Imagine a world where the DoT’s published targets and main priority were to reduce accidents per mile travelled, and included walking and cycling targets, not journey times? Where 20mph became the standard urban default speed limit, not exception? Where satnavs routinely pointed out to users when (given traffic conditions) particular journeys, short and long, were quicker by public transport / foot / bike?

A safer UK. A calmer UK. And – just possibly – a healthier, richer, and happier UK.


Posted in Activism, Blog, Cycling | Tagged , , , , , | Leave a comment

Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology applications

A short presentation to the British Society for Plant Pathology’s ‘Grand Challenges in Plant Pathology’ workshop on the uses of real-time DNA/RNA sequencing technology for plant health applications.

Doctoral Training Centre, University of Oxford, 14th September 2016.

Slides [SlideShare]: cc-by-nc-nd

Posted in Publications, Science, Talks | Tagged , , , , | Leave a comment

Labour’s Corbyn election – ten thoughts.

Riiiight so both Corbyn and Angela Eagle can run for Labour Leader, so the election’s on. Labour’s turn to agonise about runners, riders, the gap between party members, the MPs who they volunteer for, and the public who might (or might not) reward them with power. This time round the stakes couldn’t be higher, with a Brexit to influence, likely Scottish independence, and possible general election. Labour’s problems will be acute and more so than the Tories (their spin machine has somehow turned their own fortnight of indecision and ineptitude into an object lesson in ruthlessness… amazing.) This is because Labour’s problems – shifting support bases, policy fracture points, and the large and apparently increasing disconnects between voters, MPs, and a large membership* – are deeper, older, and until recently less-discussed than the Tories’ main weak points on Europe and social liberalism. Here’s my tuppence on this, all collected in one place, in no real order of logic/emphasis:

1) This isn’t the 80s. It isn’t a re-run of Militant, a long-term plan of an external party; Corbyn (at the last election) got a massive majority of Labour Members and Labour Registered Supporters.

2) This isn’t the 90s, or even the Noughties, either. Two-party politics in the UK is dead, dead, dead. Not least because the UK itself as a political entity is heading for the dustbin faster than Cameron’s ‘DC – UK PM’ business cards.

3) It isn’t a ‘Corbyn cult’. Lots of people, me included switched onto Corbyn because of his policies, not the other way round.

4) That ‘7, 7-and-a-half’ performance. Watch the full interview: Did Corbyn bang the EU drum as loudly and clearly as Cameron, Brown, Blair and Major? No. He articulated a nuanced, balanced, fact-based opinion (essentially “I’m broadly in favour of the EU and many of the benefits, but we shouldn’t ignore the problems”) in the way that we kept being told other campaigners weren’t doing – and got castigated for it. Naïve? He’s been into politics for 40-odd years. I think he was just doing what he was elected for, saying what he believes the truth to be, simply.

5) Will Corbyn ever be PM? I personally don’t think so. But the future of UK politics is probably coalitions – see (2) above – so the views he’s highlighted and continues to champion will be (I believe) represented by a coalition or leftist grouping in power.

6) If the problem is style, Angela Eagle isn’t the answer. A bit more centrist… a bit more electable… a bit less weird… a bit less Marxist… sure. But if the problem is that Corbyn isn’t enough like Blair, Eagle isn’t the answer. PM May will utterly decimate her, in seconds.

7) The policies are belters: Renationalising railways; reversing NHS privatisation; A Brexit that prioritises EEA access and freedom of movement over border control; a genuine national living wage, properly enforced; scrapping tuition fees and expanding apprenticeships; environmental protection; socially liberal. Fair personal, corporate, financial and ecological taxation to pay for it. These are all massive and proven winners amongst (variously) from 50+ to 85% of the electorate.

8) Trident is a red herring. Forget it. I’m not sure how I feel about unilateral disarmament personally. But given he’s unlikely to be a majority PM (see above) Corbyn’s stance on nuclear weapons doesn’t really rate in importance for me compared to Brexit, the NHS and taxation. In any case the lifespan of the current system can be extended to postpone that discussion past 2020 – the MoD are actually quite good at kicking stuff like this into the long grass for a bit. By then the left (hard- and centre-) can get themselves into shape for that debate. Dividing ourselves now over something the right are utterly united on, and a clear majority of the public support, is madness (Corbyn even acknowledged so with his defence review). There’s much better reasons to argue…

9) Honest votes are more powerful. By that I mean both that the recent UK referendums showed – to every single person in the country better than a lecture ever could – that first-past-the-post is rotten**, that tactical voting or second-guessing your fellow electors is stupid, dangerous and counterproductive, and that (shock, horror) voting for something you believe in is an energising and rewarding experience in its own right. This is also true of leadership elections, don’t forget; how many Tories egging on Boris now wished they hadn’t? Or backed Gove instead of Leadsom? And lastly…

10) The ‘split risk’…

Tensions in  what used to be a millions-strong Labour movement between left-behind poor and optimistic urbanites have become unendurable. They might not lead to a party split (although the press have started to publicly contemplate what lots of us have been saying for a year, or more) but equally might. Should you vote for Corbyn if you want a split, or if you want unity? It’s impossible to know, so see (9) above and vote for the person/policies you prefer.

As to the desirability of a split, well, tensions are often resolved by fractures. There need not be a SDP-type irrelevance created – the political landscape is completely different now, with smaller parties proven and established, and many more proportional elections apart from Westminster in play. More importantly, figures in the Greens, Lib Dems and Labour have already spoken overtly in the press about the need for a new, broad centre-left coalition, which both Labour descendent parties could contribute to without antagonising each other’s supporters. Probably more happily and successfully!

It’s also important to remember that the ‘unite and fight’ ethos that animates the Labour Party – which (especially) Blairite PLP are mobilising to justify opposition to Corbyn, disingenuously I feel – predates the Labour Party by a century or more. Recall the Diggers, Levellers, Abolitionists, various religious groups, Socialists, Trade Unionists… lefty-ness has always been necessarily a big tent, but which poles are placed firmly in the earth and the strength of the storm define how big the canvas is. Progressive movements which want to redistribute power and wealth from the self-protecting, actual, ever-present, and very real, ruling class can’t always be populist, and span acres of political ground, and find expression in a single monolithic electoral party. Sometimes two, but rarely three of those. If it’s time to move the poles around to firmer ground, we should.

So that’s my take. If you can vote, I hope you do…

*However much the PLP might want to, they can’t ignore the fact that a progressive party needs vast numbers of volunteers in their millions, far more than a right-wing party which can afford paid helpers. Substituting volunteers for better fundraising adverts and more millionaire backers is a symptom of the root cause, not a solution.

**Imagine how different our democracy would look if we’d had compulsory voting for the last 20 years, with county-wide party votes used to fill a proportionally-elected House of Lords. How much healthier would we be, then?

Posted in Activism, Blog | Tagged , , , , | 1 Comment

On schools testing

Schools testing has been in the news again recently… are SATS etc useful objective measures of a school’s performance? Or do they add unnecessary stress and bureaucracy?

Well I think we can all agree more objectivity and less stress are good things, and most of us would probably go further and say that SATS aren’t doing either of those jobs. But kids are so unique! And testing is so essential! How on earth can we do both?!

Well, sorry. If there’s one field that is actually good at summarising hundreds of thousands of individuals in a heterogeneous population, it’s biology. So here’s A Biologist’s Alternative to SATS. Let’s call it… STATS:

  • Pick 5-10 measures that are easy to test and cover a wide range of measurable markers of kids’ lives – say, a couple each of literacy and numeracy tests, some critical thinking, standard IQ and general knowledge. Plus, happiness / wellbeing and physical health.
  • Assemble a mixed team of inspectors, governors, academics and teachers. Have them sample, say, 20 schools from a wide range of areas and rank them.
  • Then test the kids in those schools using our metrics. Also collect information on their dates of birth, sociological factors (parents’ status, wealth, postcode, commuting distance, screen time – there’s loads of ways to do this), etc.
  • Now we can construct a GLMM (a slightly-but-not-too complicated statistical model – or else use machine learning stuff like HMMs or neural networks, although I suspect getting enough data would be hard) to model each kid’s scores as a function of their school’s ranked quality given their sociological background.
  • Here’s the important bit: we take the test scores of the 25th, 50th, and 75th percentiles and label them ‘below’, ‘on’ and ‘above’ average respectively. But we won’t translate these expected quartile scores directly into national targets because we know the makeup and weighting of school sizes and types across the country will vary greatly and nonlinearly.
  • Instead the model itself provides a national benchmark, not a standard. This will be used to model the expected scores for a given school (and students) given the same sociological information, most of which can be imputed from child benefit statements, addresses and the like.

Why would this system – more complex to set up and quite data-intensive – be any better than the current one? Here’s a few reasons:

  1. We know development is multifactorial. So is this model.
  2. We know sociology greatly affects kids’ life chances, so let’s explicitly account for it. If the upshot on that is more effort alleviating poverty than endlessly tweaking the school system, great.
  3. We can publish the tests’ relative weightings in the model so teachers/parents know which should be more emphasised.
  4. Grade inflation would be easy to abolish, simply by updating the model every year or so.
  5. The grading of schools would be simpler and integrated. Most schools will be ‘on-average’ – this is implicit – so the horrific postcode lottery will end and parents can agree to focus on improving their local school, which is better for their commute and their kids’ sanity.
  6. Regional or municipal variations due to differences in sociology will also be apparent, and can be evidenced and tackled.
Posted in Activism, Blog | Tagged , , , | Leave a comment

Why aren’t we benchmarking bioinformatics?

Talk presented at the #bench16 (benchmarking) symposium at KCL, London, Wed 20th April 2016. Funded by the SSI.

Slides (Slideshare – cc-by-nd)

Posted in Publications, Science, Talks | Tagged , , , , , | Leave a comment

More MinION – the ‘1D rapid’ prep

My last MinION post described our first experiments with this really cool new technology. I mentioned then that their standard library prep was fairly involved, and we heard that the manufacturers, Oxford Nanopore, were working on a faster, simpler library prep. We got in touch and managed to get an early prototype of this kit for developers*, so we thought we’d try it out. So our** experiments had three aims:

  • Try out this new rapid kit
  • Try out different extraction methods, to see how they worked with the kit
  • See if we could sequence some fairly damaged DNA with the kit

This is a lot of combinations to perform over a few days on one sequencing platform! Our experience from the last run gave us hope we could manage it all (we did, but with a lot of headaches over disk space – it turns out one drawback of being able to run multiple concurrent sequencing runs is hard-drive meltdown, unless you’re organised from the start – oops). In fact, to keep track of all the reads we added an ‘index’ function to the Poretools package. I really recommend you use this if you’re planning your own work.

We had eight samples to sequence, a 15-year-old dried fungarium specimen of Armillaria ostoye (likely to be poor quality DNA; extracted by Bryn Dentinger with a custom technique); some fresh Silene latifolia (Qiagen-extracted, which we’d used successfully with the previous, ‘2D’ library prep); and six arbitarily-selected plant samples, both monocots and dicots, extracted by boiling with Chelex beads (more about them at the end).


First we prepared normal ‘2D’ libraries from the Silene and Armillaria. These performed as expected from our December experiments, even the Armillaria giving decent numbers and lengths of reads (though not as many as we hoped, with some indication of worse Q-scores. We put this down to nicks in the fungarium DNA, and moved on to the 1D preps while the sequencer was still running.

The ‘1D’ in the rapid kit (vs ‘2D’ in the normal kit) refers to how many DNA strands are sequenced; in the 2D version, both forward and reverse-complement strands are sequenced. This is slower to prepare (extra adapters etc to link the two strands for sequencing) and also runs through the MinION more slowly (twice as much DNA, plus a hairpin moment) but is roughly twice as accurate, since each base is read twice. The 1D kit, on the other hand, results in single-stranded fragments, meaning we could expect lower accuracy traded off for higher speed. And the rapid kit really was fast – starting from purified extracted DNA, we added all the necessary adapters for sequencing in well under 15mins, ready to sequence.

The sequencing itself (for both the Armillaria and the Silene) went spectacularly well. Remember, this is unsheared genomic DNA, and imagine our surprise when we started to see 25, then 50, then 100, then 150kb reads come off the sequencer – many mapping straight away to reference genomes! It turns out that the size distribution of the 1D prep is much more long-tailed than the 2D/g-TUBE one. In fact, whereas the the 2D library looks like a normal-ish gamma, the 1D reads are more like an inverse exponential – lots of short stuff and then a very long tail with some mega-whopper reads in. Reads so long, in fact, that mapping them the same way as Illumina short-read data would be a bit bonkers…

As for accuracy, well, the Q-scores are definitely lower in the 1D prep; around half the 2D as we expected. On the other hand, they were still matching reference databases via BLAST/BLAT/BWA quite happily – so if your application was ID, who cares? Equally, combining mega-long 1D reads with more shorter but accurate reads could be a good way to close gaps in a de novo genome sequencing project. One technical point – there is definitely a lot more relative variation in the Q-scores for short (<1000bp) reads than for longer ones: the plot above shows the absolute difference in mean Q-scores in (first – vs – second) halves of a subset of 2,300 1D reads. You can see that below 1kb, Q-score variation exceeds 4 (bad news when mean is about there) while longer reads have no such effect (quick T-test confirms this).

So in short, the 1D prep is great if you just want to get some DNA on your screen ASAP, and/or long reads to boot. In fact, if you came up with a way to size-select all the short gubbins out before sequencing, you’d have one mega-cool sequencing protocol! What about the last bit of our test – seeing if a quick and dirty extraction could work, too? The results were… mixed-to-poor. Gel electrophoresis and Qubit both suggested the extracted DNA was pretty poor quality/concentration, and if we didn’t believe them, the gloopy, aromatic, multicoloured liquid in the tubes supplied direct evidence to our eyes. So rather than test those samples first (and risk damaging a perfectly good flowcell early on in the experiment), we held them back until the end when only a handful of pores were left. In this condition it’s hard to say whether they worked, or not: the 50 or so reads we got over an hour or two from fewer than 10 pores is a decent haul, and some of them had some matches to congenerics in BLAST, but we didn’t really give them a full enough test to be sure.

Either way, we’ll be playing around with this more in the months to come, so watch. This. Space…


*Edit: Oxford Nanopore have recently announced that the rapid kit will be out in April for general purchase.

**Again working, for better or worse, with fellow bio-beardyman Dr. Alex Papadopulos. Hi Alex! This work funded by a Royal Botanic Gardens, Kew Pilot Study Grant.

Posted in Science | Tagged , , , , , | Leave a comment

Copying LOADS of files from a folder of LOADS *AND* LOADS more in OSX

Quick one this, as it’s a tricky problem I keep having to Google/SO. So I’m posting here for others but mainly myself too!

Here’s the situation: you have a folder (with, ooh, let’s say 140,000 separate MinION reads, for instance…) which contains a subset of files you want to move or copy somewhere else. Normally, you’d do something simple like a wildcarded ‘cp’ command, e.g.:

host:joeparker$ cp dir/*files_I_want* some/other_dir

Unfortunately, if the list of files matched by that wildcard is sufficiently long (more than a few thousand), you’ll get an error like this:

-bash: /bin/cp: Argument list too long

In other words, you’re going to have to be more clever. Using the GUI/Finder usually isn’t an option either at this point, as the directory size will likely defeat Finder, too. The solution is pretty simple but takes a bit of tweaking to work in OSX (full credit to posts here and here that got me started).

Basically, we’re going to use the ‘find’ command to locate the files we want, then pass each one in turn as an argument to ‘cp’ using ‘find -exec’. This is a bit slower overall than doing the equivalent as our original wildcarded command but since that won’t work we’ll have to lump it! The command* is:

find dir -name *files_I_want* -maxdepth 1 -exec cp {} some/other_dir \;</p>

Simple, eh? Enjoy :)

*NB, In this command:

  • dir is the filesystem path to start the search; ‘find’ will recursively traverse the directory tree including and below this folder;
  • -name *glob* gives the files to match;
  • -maxdepth is how deep to recurse (e.g. 1 = ‘don’t recurse’);
  • cp is the command we’re executing on each found file (could be mv etc);
  • {} is the pipe argument standing for the found file;
  • some/other_dir is the destination argument to the command invoked by -exec
Posted in Coding, Science | Tagged , , , | Leave a comment