Category Archives: Blog

What is ‘real-time’ phylogenomics?

Over the past few years I’ve been developing research, which I collectively refer to as ‘real-time phylogenomics’ – and this is the name of our mini-site for MinION-based rapid identification-by-sequencing. Since our paper on this will hopefully be published soon, it’s probably worth defining what I hope this term denotes now, what it does not – and ultimately where I hope this research is going.

‘Phylogenomics’ is simple enough, and Jonathan Eisen at UC Davis has been a fantastic advocate of the concept. Essentially, phylogenomics is scaled-up molecular systematics, with datasets (usually derived from a genome annotation and/or transcriptome) comprising many coding loci rather than a few genes. ‘Many’ in this case usually means hundreds, or thousands, so we’re typically looking at primarily nuclear genes, although organelles’ genomes may often be incorporated, since they’re usually far easier to reliably assemble and annotate. The aim is, basically to average phylogenetic signal over many loci by combining gene trees (or an analogous approach) to try and obtain phylogenies with higher confidence (single- or few-locus approaches, including barcodes no matter how judiciously chosen, capable of producing incorrect trees with high confidence). The process is intensive, since genomes must be sequenced and then assembled to a sufficient standard to be reasonably certain of identifying orthologous loci. This isn’t the only use of the term (which also refers to phylogenies produced from whole-genome metagenomics) but the most straightforward and common one as far as eukaryote genomics is concerned, and certainly the one uppermost in my mind.

However the results are often confusing, or at least more complex than we might hope: instead of a single phylogeny with high support from all loci, and robust to the model used, we often find a high proportion of gene trees (10-30%, perhaps) agree with each other, but not the modal (most common, e.g. majority rule consensus) tree topology. For instance among 2, 326 loci in our 2013 paper on phylogenomics of the major bat families, we found that position of a particular group of echolocators – which had been hotly debated for decades, based on morphological and single-locus approaches – showed such a pattern (sometimes supporting the traditional grouping of Microchiroptera + Megachiroptera, but over 60% of loci supporting the newer Yangochiroptera + Yinpterochiroptera system. This can be for a variety of reasons, some biological and some methodological. The point is that we have a sufficiently detailed picture to let us chose between competing phylogenetic hypothesis with both statistical confidence and intuition based on comparison.

These techniques have been on the horizon for a while (certainly since at least 2000) and gathered pace over the last decade with improvements in computing, informatics, and especially next-generation sequencing. The other half of this equation, ‘real-time’ sequencing, has emerged much more recently and centres, obviously, on the MinION sequencer. Most work using this so far has focused either on the very impressive potential long-read data offers for genomic analyses, particularly assembly, or rapid ID of samples e.g. the Quick/Loman Zika and Ebola monitoring studies; and our own work.

So what, exactly, do we hope to achieve with phylogenomic-type analyses using real-time MinION data, and why?

Well, firstly, our work so far has shown that the existing pipeline (sample -> transport -> sequence-> assemble genome-> annotate genes-> align loci-> build trees) has lots of room for speedups, and we’re fairly confident that the inevitable tradeoff with accuracy when you omit or simplify certain steps (laboratory-standard sequencing, assembly) is at least compensated for by the volume of data alone. Recall that a ‘normal’ phylogenomic tree similar to our bat one might take two or more postdocs/students a year to generate from biological samples, often longer. A process taking a week instead would let you generate something like 50 more analyses in a year! The most obvious application for this is just accelerating existing research, but the potential for transforming fieldwork and citizen science is considerable. This is because you can build trees that inform species relationships, even if the species in question isn’t known. In other words a phylogenome can both reliably identify an unknown sample, and also identify if it is a new species.

More excitingly, I think we need to have a deeper look at how we both construct and analyse evolutionary models. Life on earth can be accurately and fully described best by a network, not a bifurcating tree, but this applies to loci as well as single genes. In other words, there is a single network that connects every locus in every living thing. Phylogenetic trees are only a bifurcating projection of this, while single- or multi-locus networks only comprise a part.

We’ve hitherto ignored this fact, largely because (a) trees are often a good approximation, especially in the case of eukaryote nuclear genes, and (b) the data and computation requirements a ‘network-of-life’ analysis implies are formidable. However, cracks are beginning to appear, in both faces. Firstly, many loci are subject to real biological phenomena (horizontal gene transfer, selection leading to adaptive convergence, etc) which give erroneous trees as discussed above. Meanwhile prokaryotic and viral inference is rarely even this straightforward. Secondly, expanding computing power, algorithmic complexity, and sequencing capacity (imagine just 1,000 high schools across the world, regularly using a MinION for class projects…) mean the question for us today really isn’t ‘how do we get data’, but ‘how ambitious do we want to be with it?’

Since my PhD but especially since 2012, I’ve been developing this theme. Ultimately I think the answer lies in the continuous analysis of public phylogenomic data. Imagine multiple distributed but linked analyses, continuously running to infer parts of the network of life, updating their model asynchronously both as new data flood in, and by exchanging information with each other. This is really what we mean by real-time phylogenomics – nothing less than a complete Network of Life, living in the cloud, publicly available and collaboratively and continuously inferred from real-time sequence data.

So… that’s what I plan to spend the 2020s doing, anyway.

 

Some aspects of BLASTing long-read data

Quick note to explain some of the differences we’ve observed working with long-read data (MinION, PacBio) for sample ID via BLAST. I’ll publish a proper paper on this, but for now:

  • Long reads aren’t just a bit longer than Illumina data, but two, three, four or possibly even five orders of magnitude longer (up to 10^6 already, vs 10^2). This is mathematically obvious, but extremely important…
  • … the massive length means lots of the yield is in comparatively few reads. This makes yield stats based on numbers of reads positively useless for comparison with NGS. Also…
  • Any given long read contains significantly more information than a short one does. Most obviously the genomics facilities of the world have focused on their potential for improving genome assembly contiguity and repeat spanning (as well as using synteny to spot rearrangements etc) but we’ve also shown (Parker et al, submitted) that whole coding loci can be directly recovered from single reads and used in phylogenomics without assembly and annotation. This makes sense (a ~kb long read can easily span a whole gene, also ~kb in scale) but it certainly wasn’t initially obvious, and given error rates, etc, it’s surprising it actually works.
  • Sample ID using BLAST actually works very differently. In particular, the normal ‘rubbish in, rubbish out’ rule is inverted. In other words, nanopore reads (for the time being) may be long, but inevitably contain errors. However, this length means that assuming BLAST database sequences are approximately as long/contiguous, Nanopore queries tend to either match database targets correctly, with very long alignments (hundreds/thousands of identities), or not at all.

This last point is the most important. What it means is that, for a read, interpreting the match is simple – you’ll either have a very long alignment to a target, or you won’t. Even when a read has regions of identity to more than one species, the correct read has a much longer cumulative alignment length overall for the correct one. This is the main result of our paper.

The second implication is that, as it has been put to me, for nanopore reads to be any good for an ID, you have to have a genomic database. While this is true in the narrow sense, our current work (and again, this is partly in our paper, and partly in preparation) shows that in fact all that matters is for the length distribution in the reference database to be similar to the query nanopore one. In particular, we’ve demonstrated that a rapid nanopore sequencing run, with no assembly, can itself serve as a perfectly good reference for future sample ID. This has important implications for sample ID but as I said, more on that later 😉

Only Corbyn can save the Left

Labour go into this election with the dice stacked against Corbyn, as the Tories intended. But here’s the thing – he’s only an election liability *if* you believe he can win.

But in fact everyone – surely even the man himself – can see there is *no chance* of Corbyn being the next PM, even in coalition.

On the 8th June the nation will choose not ‘May or Corbyn’ but ‘big or humungous Tory majority’.

The Lib Dems can’t win either, and again, the whole world knows it. So Brexit is most definitely happening.

Those two facts mean the left can neutralise Tory scaremongering on a ‘PM Corbyn’ or ‘Brexit backsliding’. That’s how we move the conversation on to what kind of country we want. There the Tories are on a much weaker foundation. The fact is, NINE years after the banking crisis, and seven years after they took power, the Tories have cut and wrecked at every opportunity, the longest, most savage swipe at living standards in memory. They will keep on, and on, and on at our pockets because they are ideologically unable to think of anything else.

So if the left can only acknowledge that no, they can’t win this time, no, they can’t stop Brexit, and no, Corbyn won’t be PM, they can turn on to arguments that are winnable. Better yet, a tacit pact to collaborate (perhaps supporting Compass to produce a social-media-friendly ‘tactical voting app’ based on postcodes or similar) would lay the necessary foundations for a proper power grab in 2022 – when the Tories will have been in power for over a decade.

I suspect this would be the Tories’ worst nightmare. May’s gamble would completely backfire – winning the election (narrowly) but losing the national argument.

The key is only Corbyn, or those close to him, can trigger this. Perhaps the Easter significance will inspire them…

Step-by-step: Raspberry Pi as a wifi bridge, plus a (really) low-spec media centre…

I’ll keep this brief, really so, because this is mainly an aide-memoire for when this horrific bodge breaks in the next, ooh, month or so. But, for context:

The problem:

Our office/studios are in a shed at the bottom of the garden (~15m). Wifi / wireless LAN just reaches, intermittently.

The solution:

Set up an ethernet network in the shed itself, and connect (‘bridge’) that network to the house wifi with a Raspberry Pi.

Kit:

1x Raspberry Pi (Pi 2 Model B; mine overclocked to ~1150MHz) plus SD card and reader; an old ethernet switch and cables; quite a lot of patience.


A bit more detail:

This step-by-step is going to be a bit arse-about-face, in that the order of the steps you’d actually need from scratch is completely different from the max-frustration, highly circuitous route I actually followed. Not least because I already had a Pi with Ubuntu on:

  1. Get a Pi with Ubuntu on it. This will be acting as the wireless bridge to connect the LAN to the wifi; and also serve IP addresses to other hosts on the LAN (network buffs: yes, I realise this is a crap solution). This is the second-easiest step by a mile; see: this guide for MATE and follow it. We’ll set the Pi up to run without a monitor or keyboard (‘headless’ – connecting over SSH) later, but for now don’t ruin your relationship unduly, do this bit the easy way with a monitor attached.
  2. MAKE SURE YOU ChANGE THE DEFAULT UNAME AND PASSWORD ON THE PI, AND WRITE THEM DOWN. Jeez…
  3. apt-get update the Pi a few times. You’ll thank yourself later.
  4. Set the Pi up to act as a wifi <–> LAN bridge. There are a lot of tutorials suggesting various ways to achieve this such as this, this, and all of this noise. But ignore them all – with the newest Ubuntu LTS (16.04 at time of writing) this is now far, far, far easier to do in the GUI, and more stable. Just follow this guide.
  5. Set up some other housekeeping tasks for headless login: enable SSH (see also here); set the clock to attempt to update the system time on boot if a time server’s available (make sure to add e.g. server 0.europe.pool.ntp.org to your /etc/ntp.conf file) and login to the desktop automatically. This last action isn’t necessary, and purists will claim it wastes resources, but this is a Pi 2 and we’re only serving DCHP on it, basically – it can afford that. The reason I’ve enabled this is because it seems to force the WLAN adapter to try to acquire the home wifi a bit more persistently (see below). I’ve tried to achieve the same results using wpa_supplicant, but with no stability and my time is a pretty finite resource, so screw it – I’m a scientist, not an engineer!
  6. Lastly, I’ve made some fairly heavy-duty edits (not following but at least guided by this and this) to my /etc/network/interfaces file, with a LOT of trial and error which included a couple of false starts bricking my Pi (if that happens to you, reinstall Ubuntu. Sorry.) It now reads (home wifi credentials redacted):
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    # interfaces(5) file used by ifup(8) and ifdown(8)
    # Include files from /etc/network/interfaces.d:
    source-directory /etc/network/interfaces.d# The loopback network interface
    auto lo
    iface lo inet loopback

    # LOOK at all the crap I tried...
    #allow-hotplug wlan0
    #iface eth0 inet dhcp
    #allow-hotplug wlan0
    #iface wlan0 inet manual
    #iface eth0 inet auto
    #wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
    # Yep, that lot took a while :\

    # Finally, this worked:
    auto wlan0
    iface wlan0 inet dhcp
    wpa-ssid "q***********"
    wpa-psk "a******"
    # That's it :D
  7. Connect the Pi to your other computers using the switch and miles of dodgy ethernet cabling.
  8. Disconnect the screen, reboot, and wait for a long time – potentially hours – for the Pi to acquire the wifi. You should now be able to a) ping and/or login to the Pi from other hosts on the LAN, and b) ping/access hosts on the home WLAN, and indeed the wider Internet if your WLAN has a connection(!)

A Media centre from scratch

Lastly of all, having gone to all that trouble, the glaring bandwidth inadequacies of our crap WLAN showed up. Being stingy by nature (well, and because the phone companies in our area insist that, despite living fewer than a day’s march from Westminster, their exchanges have run out of fibre capacities for 21st-century broadband) I decided to mitigate this for the long winter months the simplest way: gather the zillions of mp3s, ripped DVDs and videos from all our devices onto one server. I put an Ubuntu (the same 16.04 / MATE distribution as on the Pi, in fact) onto an old Z77 motherboard my little brother had no use for, in an ancient (~2003) ATX case, with a rock-bottom Celeron new CPU (~£25) plus 4MB SDRAM and cheap spinning drive I had lying about (a 2TB Toshiba SATA, IIRC). This is highly bodgy. So much so, in fact, that the CPU fan is cable-tied onto the mobo, because the holes for the anchor pins didn’t line up. But: it works, and only has to decode/serve MP3s and videos, after all.

I apt-get updated that a few times, plus adding in some extra packages like htop, openssh, and hardinfo – plus removing crap like games and office stuff – to make it run about as leanly as possible. Then, to manage and serve media I installed something I’d wanted to play with for a while: Kodi. This is both a media manager/player (like iTunes, VLC, or Windows Media Player) and also streaming media server, so other hosts on my LAN can access the library by streaming over the ethernet if they want without replicating files.

Setting up Kodi was simplicity itself, as was adding movies and music to the library, but one minor glitch I encountered was reading/displaying track info and artwork, which usually happens on iTunes pretty seamlessly via ID3 tags, fingerprinting, and/or Gracenote CDDB querying. Turns out I’d been spoilt this last decade, because in Kodi this doesn’t happen so magically. Instead, you need to use a tool like MusicBrainz Picard to add the tags to MP3s on your system, then re-scan them into Kodi for the metadata to be visible. The re-scanning bit isn’t as onerous as you’d think – files are left in place, the ID3 tags being used simply to update Kodi’s metadata server (I guess) – but the initial Picard search for thousands of MP3s over a slow WLAN took me most of a night.

However. A small price to pay to actually have music to listen to while I work away writing crap like this in the shed, or shoddy-quality old episodes of Blackadder or Futurama to watch in the evening :p

‘Stretched resources’ applies to parliamentarians, too…

I listened to a sad story on BBCR4 today this morning – a grieving mother can’t bury her daughter, murdered, because there’s no body – in years of searching none has been found. The killer isn’t co-operating (they don’t have to, although it would improve their parole terms to do so). She wants a change in the law so murderers can’t get parole until a body is produced (habeas corpus, literally).

This is a worthy campaign, and it must blight her life. Thing is, this scenario affects ’70 whole families’, by her own numbers. Just 70 in the whole of the UK. A change in the *law* for this? A law which has to go through parliamentary scrutiny (twice), occupying time and resources.

Couldn’t sentencing just be updated, instead?

We have a vast number of small, tiny, individually important laws but are they collectively eating away at the vitality of our democracy? MPs need to be wrestling with and thoroughly, openly, debating the massive challenges of our time – automation, climate change, ageing, food security, migration. Most complain of long hours. Not every minor cause is lucky to have an effective MP to champion it, either – which ‘good ideas’ make it into law is arbitrary, in this sense. And finally: should we have hundreds of such minor bills on the book?

Or a simpler legal code, with more judges, able to devote more time to judicious sentencing, and a fast effective appeals process for victims and the convicted if they feel sentences and parole are unjust?

Science and (small) business

Over the last 10-20 years there’s been a revolution in academic science (or should that be ‘coup’?) where many aspects of the job have been professionalised and formalised, especially project management but management in general. This generally includes tools like GANTTs, milestones, workload models, targets and many other things previously unmentionable in academia but common in industry, especially large organisations. Lots of academics will tell you they think it’s bureaucratic overkill, intrusive, a waste of time, and worse (to put it mildly) but the awkward truth is that, as lab groups steadily increased in size (as fewer, larger grants went to increasingly senior PIs or consortia) many of the limitations of the collegiate style of the past, centred on a single academic with a tight-knit group, have been exposed.

Frequently the introduction of ‘management practices’, often after hiring expensive consultants, is accompanied by compulsory management training. Sometimes it can be an improvement. More normally (in my experience) whether an improvement in outcomes (as distinct from ‘efficiency’) has been achieved probably depends on whether you cost in staff time (or overtime) and morale. You can make arguments either way.

But I can’t help thinking: why are we attempting to replicate practices from big/massive private sector organisations, anyway? I suspect, the answer in part is because those are the clients management consultants have the most experience working with. More seriously, those organisations differ in fundamental respects from even the largest universities, let alone individual research projects. This is because large companies:

  • Add value to inputs to create physical goods or services that are easily costed by the market mechanism (this is the big one)
  • Usually have large cash reserves, or easy access to finance (tellingly when this ends they usually get liquidated)
  • Keep an eye on longer-term outcomes, but primarily focus on the 5-10 year horizon
  • Compete directly with others for customers (in some respects an infinite resource)
  • Are answerable, at least, yearly, to shareholders – with share value and dividends being the primary drivers of shareholder satisfaction.

Meanwhile, universities (and to an even greater extreme, research groups/PIs):

  • Produce knowledge outputs with zero market value*
  • Live hand-to-mouth on short grants
  • Need long-term, strategic thinking to succeed (really, this is why we get paid at all)
  • Compete indirectly for finite resources grants and publications, based partly on track record and partly on potential.
  • Answer, ultimately, to posterity, their families, and their Head Of Department

I want to be clear here – I’m not saying, by any means, that previous management techniques (ie, ‘none’) work well in today’s science environment – but I do think we should probably look to other models than, say General Motors, or GlaxoSmithKline. The problem is often compounded because PIs have no business experience (certainly not in startups) while consultants often come from big business – their ability to meet in the middle is limited.

Instead small and medium enterprises (SME)s are a much closer model to how science works. Here good management of resources and people is extremely important, but the scale is much smaller, permitting different management methods, often focussing on flexibility and results, not hierarchies and systems. For instance, project goals are often still designed to be SMART (specific, measurable, achievable, realistic, time-scaled) but these will be revisited often and informally, and adjusted whenever necessary. Failure is a recognised part of the ongoing process. This is the exact opposite to how a GANTT, say, is used in academia: often drawn up at the project proposal (design) stage, it is then ignored until the end of the grant, when the PI scrabbles to fudge the outcomes, goals, or both to make the work actually carried out fit, so they don’t get a black mark from the funder/HoD for missing targets.

There are plenty of other models, and they vary not just by organisation size/type (e.g. tech startup, games studio, SkunkWorks, logistics distributor, niche construction subcontractor) but you see what I mean: copying ‘big business’ wholesale without looking at the whole ecosystem of business practices makes little sense…

*Obviously not all, or even most, scientific output will never realise any economic value – but it can be years, or centuries removed from the work to create it. And spin-outs are relevant to a tiny proportion of PI’s work, even in applied fields.

Making progress down the road

Too many laws and customs of driving make speed more important than safety, from the driving instructors’ “make good progress down the road” (e.g. “hurry the fuck up”, which most drivers internalise as “drive at least as fast as the speed limit unless there’s literally another car right in front of you”), to every transport investment ever being marketed to (presumably furious) taxpayers as “reducing journey times”.

This is in contrast to other European countries, where safety is #1, and speed just a nice-to-have. Surely it’s time for the national Government to admit – as London’s TfL have – that the UK is blessed with only a fixed amount of road space, so with growing numbers of people using it, we all have to accept that journeys will get slower in future, not quicker.

We have a real blind spot (pun intended) in the UK about traffic jams. On the one hand, we are only too aware of all the time we **WASTE** sat in stationary traffic each day – most car journeys are fewer than five miles, made by commuters, and involve up to half that time in queues – so traffic jams are a fact of driving life here in the UK.

On the other hand, peoples’ frustration / anger / surprise about being stuck in a traffic jam on any given morning (when they are, every morning) is total. But this is bizarre… We know the traffic will be there, but still get in our car expecting a free road, at 08:30 on a weekday! Where’s all that traffic come from!

Surely it’s time to admit traffic jams exist, will get worse, not better, and constantly lurching from 0 to 30mph and back again is pointless as well as dangerous?

Imagine a world where the DoT’s published targets and main priority were to reduce accidents per mile travelled, and included walking and cycling targets, not journey times? Where 20mph became the standard urban default speed limit, not exception? Where satnavs routinely pointed out to users when (given traffic conditions) particular journeys, short and long, were quicker by public transport / foot / bike?

A safer UK. A calmer UK. And – just possibly – a healthier, richer, and happier UK.

Imagine.

Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology applications

A short presentation to the British Society for Plant Pathology’s ‘Grand Challenges in Plant Pathology’ workshop on the uses of real-time DNA/RNA sequencing technology for plant health applications.

Doctoral Training Centre, University of Oxford, 14th September 2016.

Slides [SlideShare]: cc-by-nc-nd

[slideshare id=66051562&doc=smrt-nanopore-gcpp-joeparker-160915100855]

Labour’s Corbyn election – ten thoughts.

Riiiight so both Corbyn and Angela Eagle can run for Labour Leader, so the election’s on. Labour’s turn to agonise about runners, riders, the gap between party members, the MPs who they volunteer for, and the public who might (or might not) reward them with power. This time round the stakes couldn’t be higher, with a Brexit to influence, likely Scottish independence, and possible general election. Labour’s problems will be acute and more so than the Tories (their spin machine has somehow turned their own fortnight of indecision and ineptitude into an object lesson in ruthlessness… amazing.) This is because Labour’s problems – shifting support bases, policy fracture points, and the large and apparently increasing disconnects between voters, MPs, and a large membership* – are deeper, older, and until recently less-discussed than the Tories’ main weak points on Europe and social liberalism. Here’s my tuppence on this, all collected in one place, in no real order of logic/emphasis:

1) This isn’t the 80s. It isn’t a re-run of Militant, a long-term plan of an external party; Corbyn (at the last election) got a massive majority of Labour Members and Labour Registered Supporters.

2) This isn’t the 90s, or even the Noughties, either. Two-party politics in the UK is dead, dead, dead. Not least because the UK itself as a political entity is heading for the dustbin faster than Cameron’s ‘DC – UK PM’ business cards.

3) It isn’t a ‘Corbyn cult’. Lots of people, me included switched onto Corbyn because of his policies, not the other way round.

4) That ‘7, 7-and-a-half’ performance. Watch the full interview: Did Corbyn bang the EU drum as loudly and clearly as Cameron, Brown, Blair and Major? No. He articulated a nuanced, balanced, fact-based opinion (essentially “I’m broadly in favour of the EU and many of the benefits, but we shouldn’t ignore the problems”) in the way that we kept being told other campaigners weren’t doing – and got castigated for it. Naïve? He’s been into politics for 40-odd years. I think he was just doing what he was elected for, saying what he believes the truth to be, simply.

5) Will Corbyn ever be PM? I personally don’t think so. But the future of UK politics is probably coalitions – see (2) above – so the views he’s highlighted and continues to champion will be (I believe) represented by a coalition or leftist grouping in power.

6) If the problem is style, Angela Eagle isn’t the answer. A bit more centrist… a bit more electable… a bit less weird… a bit less Marxist… sure. But if the problem is that Corbyn isn’t enough like Blair, Eagle isn’t the answer. PM May will utterly decimate her, in seconds.

7) The policies are belters: Renationalising railways; reversing NHS privatisation; A Brexit that prioritises EEA access and freedom of movement over border control; a genuine national living wage, properly enforced; scrapping tuition fees and expanding apprenticeships; environmental protection; socially liberal. Fair personal, corporate, financial and ecological taxation to pay for it. These are all massive and proven winners amongst (variously) from 50+ to 85% of the electorate.

8) Trident is a red herring. Forget it. I’m not sure how I feel about unilateral disarmament personally. But given he’s unlikely to be a majority PM (see above) Corbyn’s stance on nuclear weapons doesn’t really rate in importance for me compared to Brexit, the NHS and taxation. In any case the lifespan of the current system can be extended to postpone that discussion past 2020 – the MoD are actually quite good at kicking stuff like this into the long grass for a bit. By then the left (hard- and centre-) can get themselves into shape for that debate. Dividing ourselves now over something the right are utterly united on, and a clear majority of the public support, is madness (Corbyn even acknowledged so with his defence review). There’s much better reasons to argue…

9) Honest votes are more powerful. By that I mean both that the recent UK referendums showed – to every single person in the country better than a lecture ever could – that first-past-the-post is rotten**, that tactical voting or second-guessing your fellow electors is stupid, dangerous and counterproductive, and that (shock, horror) voting for something you believe in is an energising and rewarding experience in its own right. This is also true of leadership elections, don’t forget; how many Tories egging on Boris now wished they hadn’t? Or backed Gove instead of Leadsom? And lastly…

10) The ‘split risk’…

Tensions in  what used to be a millions-strong Labour movement between left-behind poor and optimistic urbanites have become unendurable. They might not lead to a party split (although the press have started to publicly contemplate what lots of us have been saying for a year, or more) but equally might. Should you vote for Corbyn if you want a split, or if you want unity? It’s impossible to know, so see (9) above and vote for the person/policies you prefer.

As to the desirability of a split, well, tensions are often resolved by fractures. There need not be a SDP-type irrelevance created – the political landscape is completely different now, with smaller parties proven and established, and many more proportional elections apart from Westminster in play. More importantly, figures in the Greens, Lib Dems and Labour have already spoken overtly in the press about the need for a new, broad centre-left coalition, which both Labour descendent parties could contribute to without antagonising each other’s supporters. Probably more happily and successfully!

It’s also important to remember that the ‘unite and fight’ ethos that animates the Labour Party – which (especially) Blairite PLP are mobilising to justify opposition to Corbyn, disingenuously I feel – predates the Labour Party by a century or more. Recall the Diggers, Levellers, Abolitionists, various religious groups, Socialists, Trade Unionists… lefty-ness has always been necessarily a big tent, but which poles are placed firmly in the earth and the strength of the storm define how big the canvas is. Progressive movements which want to redistribute power and wealth from the self-protecting, actual, ever-present, and very real, ruling class can’t always be populist, and span acres of political ground, and find expression in a single monolithic electoral party. Sometimes two, but rarely three of those. If it’s time to move the poles around to firmer ground, we should.

So that’s my take. If you can vote, I hope you do…

*However much the PLP might want to, they can’t ignore the fact that a progressive party needs vast numbers of volunteers in their millions, far more than a right-wing party which can afford paid helpers. Substituting volunteers for better fundraising adverts and more millionaire backers is a symptom of the root cause, not a solution.

**Imagine how different our democracy would look if we’d had compulsory voting for the last 20 years, with county-wide party votes used to fill a proportionally-elected House of Lords. How much healthier would we be, then?

On schools testing

Schools testing has been in the news again recently… are SATS etc useful objective measures of a school’s performance? Or do they add unnecessary stress and bureaucracy?

Well I think we can all agree more objectivity and less stress are good things, and most of us would probably go further and say that SATS aren’t doing either of those jobs. But kids are so unique! And testing is so essential! How on earth can we do both?!

Well, sorry. If there’s one field that is actually good at summarising hundreds of thousands of individuals in a heterogeneous population, it’s biology. So here’s A Biologist’s Alternative to SATS. Let’s call it… STATS:

  • Pick 5-10 measures that are easy to test and cover a wide range of measurable markers of kids’ lives – say, a couple each of literacy and numeracy tests, some critical thinking, standard IQ and general knowledge. Plus, happiness / wellbeing and physical health.
  • Assemble a mixed team of inspectors, governors, academics and teachers. Have them sample, say, 20 schools from a wide range of areas and rank them.
  • Then test the kids in those schools using our metrics. Also collect information on their dates of birth, sociological factors (parents’ status, wealth, postcode, commuting distance, screen time – there’s loads of ways to do this), etc.
  • Now we can construct a GLMM (a slightly-but-not-too complicated statistical model – or else use machine learning stuff like HMMs or neural networks, although I suspect getting enough data would be hard) to model each kid’s scores as a function of their school’s ranked quality given their sociological background.
  • Here’s the important bit: we take the test scores of the 25th, 50th, and 75th percentiles and label them ‘below’, ‘on’ and ‘above’ average respectively. But we won’t translate these expected quartile scores directly into national targets because we know the makeup and weighting of school sizes and types across the country will vary greatly and nonlinearly.
  • Instead the model itself provides a national benchmark, not a standard. This will be used to model the expected scores for a given school (and students) given the same sociological information, most of which can be imputed from child benefit statements, addresses and the like.

Why would this system – more complex to set up and quite data-intensive – be any better than the current one? Here’s a few reasons:

  1. We know development is multifactorial. So is this model.
  2. We know sociology greatly affects kids’ life chances, so let’s explicitly account for it. If the upshot on that is more effort alleviating poverty than endlessly tweaking the school system, great.
  3. We can publish the tests’ relative weightings in the model so teachers/parents know which should be more emphasised.
  4. Grade inflation would be easy to abolish, simply by updating the model every year or so.
  5. The grading of schools would be simpler and integrated. Most schools will be ‘on-average’ – this is implicit – so the horrific postcode lottery will end and parents can agree to focus on improving their local school, which is better for their commute and their kids’ sanity.
  6. Regional or municipal variations due to differences in sociology will also be apparent, and can be evidenced and tackled.