Science and (small) business

Over the last 10-20 years there’s been a revolution in academic science (or should that be ‘coup’?) where many aspects of the job have been professionalised and formalised, especially project management but management in general. This generally includes tools like GANTTs, milestones, workload models, targets and many other things previously unmentionable in academia but common in industry, especially large organisations. Lots of academics will tell you they think it’s bureaucratic overkill, intrusive, a waste of time, and worse (to put it mildly) but the awkward truth is that, as lab groups steadily increased in size (as fewer, larger grants went to increasingly senior PIs or consortia) many of the limitations of the collegiate style of the past, centred on a single academic with a tight-knit group, have been exposed.

Frequently the introduction of ‘management practices’, often after hiring expensive consultants, is accompanied by compulsory management training. Sometimes it can be an improvement. More normally (in my experience) whether an improvement in outcomes (as distinct from ‘efficiency’) has been achieved probably depends on whether you cost in staff time (or overtime) and morale. You can make arguments either way.

But I can’t help thinking: why are we attempting to replicate practices from big/massive private sector organisations, anyway? I suspect, the answer in part is because those are the clients management consultants have the most experience working with. More seriously, those organisations differ in fundamental respects from even the largest universities, let alone individual research projects. This is because large companies:

  • Add value to inputs to create physical goods or services that are easily costed by the market mechanism (this is the big one)
  • Usually have large cash reserves, or easy access to finance (tellingly when this ends they usually get liquidated)
  • Keep an eye on longer-term outcomes, but primarily focus on the 5-10 year horizon
  • Compete directly with others for customers (in some respects an infinite resource)
  • Are answerable, at least, yearly, to shareholders – with share value and dividends being the primary drivers of shareholder satisfaction.

Meanwhile, universities (and to an even greater extreme, research groups/PIs):

  • Produce knowledge outputs with zero market value*
  • Live hand-to-mouth on short grants
  • Need long-term, strategic thinking to succeed (really, this is why we get paid at all)
  • Compete indirectly for finite resources grants and publications, based partly on track record and partly on potential.
  • Answer, ultimately, to posterity, their families, and their Head Of Department

I want to be clear here – I’m not saying, by any means, that previous management techniques (ie, ‘none’) work well in today’s science environment – but I do think we should probably look to other models than, say General Motors, or GlaxoSmithKline. The problem is often compounded because PIs have no business experience (certainly not in startups) while consultants often come from big business – their ability to meet in the middle is limited.

Instead small and medium enterprises (SME)s are a much closer model to how science works. Here good management of resources and people is extremely important, but the scale is much smaller, permitting different management methods, often focussing on flexibility and results, not hierarchies and systems. For instance, project goals are often still designed to be SMART (specific, measurable, achievable, realistic, time-scaled) but these will be revisited often and informally, and adjusted whenever necessary. Failure is a recognised part of the ongoing process. This is the exact opposite to how a GANTT, say, is used in academia: often drawn up at the project proposal (design) stage, it is then ignored until the end of the grant, when the PI scrabbles to fudge the outcomes, goals, or both to make the work actually carried out fit, so they don’t get a black mark from the funder/HoD for missing targets.

There are plenty of other models, and they vary not just by organisation size/type (e.g. tech startup, games studio, SkunkWorks, logistics distributor, niche construction subcontractor) but you see what I mean: copying ‘big business’ wholesale without looking at the whole ecosystem of business practices makes little sense…

*Obviously not all, or even most, scientific output will never realise any economic value – but it can be years, or centuries removed from the work to create it. And spin-outs are relevant to a tiny proportion of PI’s work, even in applied fields.

Posted in Activism, Blog, Science | Tagged , , , | Leave a comment

Making progress down the road

Too many laws and customs of driving make speed more important than safety, from the driving instructors’ “make good progress down the road” (e.g. “hurry the fuck up”, which most drivers internalise as “drive at least as fast as the speed limit unless there’s literally another car right in front of you”), to every transport investment ever being marketed to (presumably furious) taxpayers as “reducing journey times”.

This is in contrast to other European countries, where safety is #1, and speed just a nice-to-have. Surely it’s time for the national Government to admit – as London’s TfL have – that the UK is blessed with only a fixed amount of road space, so with growing numbers of people using it, we all have to accept that journeys will get slower in future, not quicker.

We have a real blind spot (pun intended) in the UK about traffic jams. On the one hand, we are only too aware of all the time we **WASTE** sat in stationary traffic each day – most car journeys are fewer than five miles, made by commuters, and involve up to half that time in queues – so traffic jams are a fact of driving life here in the UK.

On the other hand, peoples’ frustration / anger / surprise about being stuck in a traffic jam on any given morning (when they are, every morning) is total. But this is bizarre… We know the traffic will be there, but still get in our car expecting a free road, at 08:30 on a weekday! Where’s all that traffic come from!

Surely it’s time to admit traffic jams exist, will get worse, not better, and constantly lurching from 0 to 30mph and back again is pointless as well as dangerous?

Imagine a world where the DoT’s published targets and main priority were to reduce accidents per mile travelled, and included walking and cycling targets, not journey times? Where 20mph became the standard urban default speed limit, not exception? Where satnavs routinely pointed out to users when (given traffic conditions) particular journeys, short and long, were quicker by public transport / foot / bike?

A safer UK. A calmer UK. And – just possibly – a healthier, richer, and happier UK.

Imagine.

Posted in Activism, Blog, Cycling | Tagged , , , , , | Leave a comment

Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology applications

A short presentation to the British Society for Plant Pathology’s ‘Grand Challenges in Plant Pathology’ workshop on the uses of real-time DNA/RNA sequencing technology for plant health applications.

Doctoral Training Centre, University of Oxford, 14th September 2016.

Slides [SlideShare]: cc-by-nc-nd


Posted in Publications, Science, Talks | Tagged , , , , | Leave a comment

Labour’s Corbyn election – ten thoughts.

Riiiight so both Corbyn and Angela Eagle can run for Labour Leader, so the election’s on. Labour’s turn to agonise about runners, riders, the gap between party members, the MPs who they volunteer for, and the public who might (or might not) reward them with power. This time round the stakes couldn’t be higher, with a Brexit to influence, likely Scottish independence, and possible general election. Labour’s problems will be acute and more so than the Tories (their spin machine has somehow turned their own fortnight of indecision and ineptitude into an object lesson in ruthlessness… amazing.) This is because Labour’s problems – shifting support bases, policy fracture points, and the large and apparently increasing disconnects between voters, MPs, and a large membership* – are deeper, older, and until recently less-discussed than the Tories’ main weak points on Europe and social liberalism. Here’s my tuppence on this, all collected in one place, in no real order of logic/emphasis:

1) This isn’t the 80s. It isn’t a re-run of Militant, a long-term plan of an external party; Corbyn (at the last election) got a massive majority of Labour Members and Labour Registered Supporters.

2) This isn’t the 90s, or even the Noughties, either. Two-party politics in the UK is dead, dead, dead. Not least because the UK itself as a political entity is heading for the dustbin faster than Cameron’s ‘DC – UK PM’ business cards.

3) It isn’t a ‘Corbyn cult’. Lots of people, me included switched onto Corbyn because of his policies, not the other way round.

4) That ‘7, 7-and-a-half’ performance. Watch the full interview: Did Corbyn bang the EU drum as loudly and clearly as Cameron, Brown, Blair and Major? No. He articulated a nuanced, balanced, fact-based opinion (essentially “I’m broadly in favour of the EU and many of the benefits, but we shouldn’t ignore the problems”) in the way that we kept being told other campaigners weren’t doing – and got castigated for it. Naïve? He’s been into politics for 40-odd years. I think he was just doing what he was elected for, saying what he believes the truth to be, simply.

5) Will Corbyn ever be PM? I personally don’t think so. But the future of UK politics is probably coalitions – see (2) above – so the views he’s highlighted and continues to champion will be (I believe) represented by a coalition or leftist grouping in power.

6) If the problem is style, Angela Eagle isn’t the answer. A bit more centrist… a bit more electable… a bit less weird… a bit less Marxist… sure. But if the problem is that Corbyn isn’t enough like Blair, Eagle isn’t the answer. PM May will utterly decimate her, in seconds.

7) The policies are belters: Renationalising railways; reversing NHS privatisation; A Brexit that prioritises EEA access and freedom of movement over border control; a genuine national living wage, properly enforced; scrapping tuition fees and expanding apprenticeships; environmental protection; socially liberal. Fair personal, corporate, financial and ecological taxation to pay for it. These are all massive and proven winners amongst (variously) from 50+ to 85% of the electorate.

8) Trident is a red herring. Forget it. I’m not sure how I feel about unilateral disarmament personally. But given he’s unlikely to be a majority PM (see above) Corbyn’s stance on nuclear weapons doesn’t really rate in importance for me compared to Brexit, the NHS and taxation. In any case the lifespan of the current system can be extended to postpone that discussion past 2020 – the MoD are actually quite good at kicking stuff like this into the long grass for a bit. By then the left (hard- and centre-) can get themselves into shape for that debate. Dividing ourselves now over something the right are utterly united on, and a clear majority of the public support, is madness (Corbyn even acknowledged so with his defence review). There’s much better reasons to argue…

9) Honest votes are more powerful. By that I mean both that the recent UK referendums showed – to every single person in the country better than a lecture ever could – that first-past-the-post is rotten**, that tactical voting or second-guessing your fellow electors is stupid, dangerous and counterproductive, and that (shock, horror) voting for something you believe in is an energising and rewarding experience in its own right. This is also true of leadership elections, don’t forget; how many Tories egging on Boris now wished they hadn’t? Or backed Gove instead of Leadsom? And lastly…

10) The ‘split risk’…

Tensions in  what used to be a millions-strong Labour movement between left-behind poor and optimistic urbanites have become unendurable. They might not lead to a party split (although the press have started to publicly contemplate what lots of us have been saying for a year, or more) but equally might. Should you vote for Corbyn if you want a split, or if you want unity? It’s impossible to know, so see (9) above and vote for the person/policies you prefer.

As to the desirability of a split, well, tensions are often resolved by fractures. There need not be a SDP-type irrelevance created – the political landscape is completely different now, with smaller parties proven and established, and many more proportional elections apart from Westminster in play. More importantly, figures in the Greens, Lib Dems and Labour have already spoken overtly in the press about the need for a new, broad centre-left coalition, which both Labour descendent parties could contribute to without antagonising each other’s supporters. Probably more happily and successfully!

It’s also important to remember that the ‘unite and fight’ ethos that animates the Labour Party – which (especially) Blairite PLP are mobilising to justify opposition to Corbyn, disingenuously I feel – predates the Labour Party by a century or more. Recall the Diggers, Levellers, Abolitionists, various religious groups, Socialists, Trade Unionists… lefty-ness has always been necessarily a big tent, but which poles are placed firmly in the earth and the strength of the storm define how big the canvas is. Progressive movements which want to redistribute power and wealth from the self-protecting, actual, ever-present, and very real, ruling class can’t always be populist, and span acres of political ground, and find expression in a single monolithic electoral party. Sometimes two, but rarely three of those. If it’s time to move the poles around to firmer ground, we should.

So that’s my take. If you can vote, I hope you do…

*However much the PLP might want to, they can’t ignore the fact that a progressive party needs vast numbers of volunteers in their millions, far more than a right-wing party which can afford paid helpers. Substituting volunteers for better fundraising adverts and more millionaire backers is a symptom of the root cause, not a solution.

**Imagine how different our democracy would look if we’d had compulsory voting for the last 20 years, with county-wide party votes used to fill a proportionally-elected House of Lords. How much healthier would we be, then?

Posted in Activism, Blog | Tagged , , , , | 1 Comment

On schools testing

Schools testing has been in the news again recently… are SATS etc useful objective measures of a school’s performance? Or do they add unnecessary stress and bureaucracy?

Well I think we can all agree more objectivity and less stress are good things, and most of us would probably go further and say that SATS aren’t doing either of those jobs. But kids are so unique! And testing is so essential! How on earth can we do both?!

Well, sorry. If there’s one field that is actually good at summarising hundreds of thousands of individuals in a heterogeneous population, it’s biology. So here’s A Biologist’s Alternative to SATS. Let’s call it… STATS:

  • Pick 5-10 measures that are easy to test and cover a wide range of measurable markers of kids’ lives – say, a couple each of literacy and numeracy tests, some critical thinking, standard IQ and general knowledge. Plus, happiness / wellbeing and physical health.
  • Assemble a mixed team of inspectors, governors, academics and teachers. Have them sample, say, 20 schools from a wide range of areas and rank them.
  • Then test the kids in those schools using our metrics. Also collect information on their dates of birth, sociological factors (parents’ status, wealth, postcode, commuting distance, screen time – there’s loads of ways to do this), etc.
  • Now we can construct a GLMM (a slightly-but-not-too complicated statistical model – or else use machine learning stuff like HMMs or neural networks, although I suspect getting enough data would be hard) to model each kid’s scores as a function of their school’s ranked quality given their sociological background.
  • Here’s the important bit: we take the test scores of the 25th, 50th, and 75th percentiles and label them ‘below’, ‘on’ and ‘above’ average respectively. But we won’t translate these expected quartile scores directly into national targets because we know the makeup and weighting of school sizes and types across the country will vary greatly and nonlinearly.
  • Instead the model itself provides a national benchmark, not a standard. This will be used to model the expected scores for a given school (and students) given the same sociological information, most of which can be imputed from child benefit statements, addresses and the like.

Why would this system – more complex to set up and quite data-intensive – be any better than the current one? Here’s a few reasons:

  1. We know development is multifactorial. So is this model.
  2. We know sociology greatly affects kids’ life chances, so let’s explicitly account for it. If the upshot on that is more effort alleviating poverty than endlessly tweaking the school system, great.
  3. We can publish the tests’ relative weightings in the model so teachers/parents know which should be more emphasised.
  4. Grade inflation would be easy to abolish, simply by updating the model every year or so.
  5. The grading of schools would be simpler and integrated. Most schools will be ‘on-average’ – this is implicit – so the horrific postcode lottery will end and parents can agree to focus on improving their local school, which is better for their commute and their kids’ sanity.
  6. Regional or municipal variations due to differences in sociology will also be apparent, and can be evidenced and tackled.
Posted in Activism, Blog | Tagged , , , | Leave a comment

Why aren’t we benchmarking bioinformatics?

Talk presented at the #bench16 (benchmarking) symposium at KCL, London, Wed 20th April 2016. Funded by the SSI.

Slides (Slideshare – cc-by-nd)


Posted in Publications, Science, Talks | Tagged , , , , , | Leave a comment

More MinION – the ‘1D rapid’ prep

My last MinION post described our first experiments with this really cool new technology. I mentioned then that their standard library prep was fairly involved, and we heard that the manufacturers, Oxford Nanopore, were working on a faster, simpler library prep. We got in touch and managed to get an early prototype of this kit for developers*, so we thought we’d try it out. So our** experiments had three aims:

  • Try out this new rapid kit
  • Try out different extraction methods, to see how they worked with the kit
  • See if we could sequence some fairly damaged DNA with the kit

This is a lot of combinations to perform over a few days on one sequencing platform! Our experience from the last run gave us hope we could manage it all (we did, but with a lot of headaches over disk space – it turns out one drawback of being able to run multiple concurrent sequencing runs is hard-drive meltdown, unless you’re organised from the start – oops). In fact, to keep track of all the reads we added an ‘index’ function to the Poretools package. I really recommend you use this if you’re planning your own work.

We had eight samples to sequence, a 15-year-old dried fungarium specimen of Armillaria ostoye (likely to be poor quality DNA; extracted by Bryn Dentinger with a custom technique); some fresh Silene latifolia (Qiagen-extracted, which we’d used successfully with the previous, ‘2D’ library prep); and six arbitarily-selected plant samples, both monocots and dicots, extracted by boiling with Chelex beads (more about them at the end).

IMG_4555

First we prepared normal ‘2D’ libraries from the Silene and Armillaria. These performed as expected from our December experiments, even the Armillaria giving decent numbers and lengths of reads (though not as many as we hoped, with some indication of worse Q-scores. We put this down to nicks in the fungarium DNA, and moved on to the 1D preps while the sequencer was still running.

The ‘1D’ in the rapid kit (vs ‘2D’ in the normal kit) refers to how many DNA strands are sequenced; in the 2D version, both forward and reverse-complement strands are sequenced. This is slower to prepare (extra adapters etc to link the two strands for sequencing) and also runs through the MinION more slowly (twice as much DNA, plus a hairpin moment) but is roughly twice as accurate, since each base is read twice. The 1D kit, on the other hand, results in single-stranded fragments, meaning we could expect lower accuracy traded off for higher speed. And the rapid kit really was fast – starting from purified extracted DNA, we added all the necessary adapters for sequencing in well under 15mins, ready to sequence.

lenQuals
The sequencing itself (for both the Armillaria and the Silene) went spectacularly well. Remember, this is unsheared genomic DNA, and imagine our surprise when we started to see 25, then 50, then 100, then 150kb reads come off the sequencer – many mapping straight away to reference genomes! It turns out that the size distribution of the 1D prep is much more long-tailed than the 2D/g-TUBE one. In fact, whereas the the 2D library looks like a normal-ish gamma, the 1D reads are more like an inverse exponential – lots of short stuff and then a very long tail with some mega-whopper reads in. Reads so long, in fact, that mapping them the same way as Illumina short-read data would be a bit bonkers…

As for accuracy, well, the Q-scores are definitely lower in the 1D prep; around half the 2D as we expected. On the other hand, they were still matching reference databases via BLAST/BLAT/BWA quite happily – so if your application was ID, who cares? Equally, combining mega-long 1D reads with more shorter but accurate reads could be a good way to close gaps in a de novo genome sequencing project. One technical point – there is definitely a lot more relative variation in the Q-scores for short (<1000bp) reads than for longer ones: the plot above shows the absolute difference in mean Q-scores in (first – vs – second) halves of a subset of 2,300 1D reads. You can see that below 1kb, Q-score variation exceeds 4 (bad news when mean is about there) while longer reads have no such effect (quick T-test confirms this).

So in short, the 1D prep is great if you just want to get some DNA on your screen ASAP, and/or long reads to boot. In fact, if you came up with a way to size-select all the short gubbins out before sequencing, you’d have one mega-cool sequencing protocol! What about the last bit of our test – seeing if a quick and dirty extraction could work, too? The results were… mixed-to-poor. Gel electrophoresis and Qubit both suggested the extracted DNA was pretty poor quality/concentration, and if we didn’t believe them, the gloopy, aromatic, multicoloured liquid in the tubes supplied direct evidence to our eyes. So rather than test those samples first (and risk damaging a perfectly good flowcell early on in the experiment), we held them back until the end when only a handful of pores were left. In this condition it’s hard to say whether they worked, or not: the 50 or so reads we got over an hour or two from fewer than 10 pores is a decent haul, and some of them had some matches to congenerics in BLAST, but we didn’t really give them a full enough test to be sure.

Either way, we’ll be playing around with this more in the months to come, so watch. This. Space…

 

*Edit: Oxford Nanopore have recently announced that the rapid kit will be out in April for general purchase.

**Again working, for better or worse, with fellow bio-beardyman Dr. Alex Papadopulos. Hi Alex! This work funded by a Royal Botanic Gardens, Kew Pilot Study Grant.

Posted in Science | Tagged , , , , , | Leave a comment

Copying LOADS of files from a folder of LOADS *AND* LOADS more in OSX

Quick one this, as it’s a tricky problem I keep having to Google/SO. So I’m posting here for others but mainly myself too!

Here’s the situation: you have a folder (with, ooh, let’s say 140,000 separate MinION reads, for instance…) which contains a subset of files you want to move or copy somewhere else. Normally, you’d do something simple like a wildcarded ‘cp’ command, e.g.:

1
host:joeparker$ cp dir/*files_I_want* some/other_dir

Unfortunately, if the list of files matched by that wildcard is sufficiently long (more than a few thousand), you’ll get an error like this:

1
-bash: /bin/cp: Argument list too long

In other words, you’re going to have to be more clever. Using the GUI/Finder usually isn’t an option either at this point, as the directory size will likely defeat Finder, too. The solution is pretty simple but takes a bit of tweaking to work in OSX (full credit to posts here and here that got me started).

Basically, we’re going to use the ‘find’ command to locate the files we want, then pass each one in turn as an argument to ‘cp’ using ‘find -exec’. This is a bit slower overall than doing the equivalent as our original wildcarded command but since that won’t work we’ll have to lump it! The command* is:

1
find dir -name *files_I_want* -maxdepth 1 -exec cp {} some/other_dir \;</p>

Simple, eh? Enjoy :)

*NB, In this command:

  • dir is the filesystem path to start the search; ‘find’ will recursively traverse the directory tree including and below this folder;
  • -name *glob* gives the files to match;
  • -maxdepth is how deep to recurse (e.g. 1 = ‘don’t recurse’);
  • cp is the command we’re executing on each found file (could be mv etc);
  • {} is the pipe argument standing for the found file;
  • some/other_dir is the destination argument to the command invoked by -exec
Posted in Coding, Science | Tagged , , , | Leave a comment

How to fake an OSX theme appearance in Linux Ubuntu MATE

I’ve recently been fiddling about and trying to fake an OSX-style GUI appearance in Linux Ubuntu MATE (15.04). This is partly because I prefer the OSX GUI (let’s be honest) and partly because most of my colleagues are also Mac users mainly (bioinformaticians…) and students in particular fear change! The Mac-style appearance seems to calm people down. A bit.

The specific OS I’m going for is 10.9 Mavericks, because it’s my current favourite and nice and clear. There are two main things to set up: the OS itself and the appearance. Let’s take them in turn.

1. The OS

I’ve picked Ubuntu (why on Earth wouldn’t you?!) and specifically the MATE distribution. This has a lot of nice features that make it fairly Mac-y, and in particular the windowing and package management seem smoother to me than the vanilla Ubuntu Unity system. Get it here: https://ubuntu-mate.org/vivid/.* The installation is pretty painless on Mac, PC or an existing Linux system. If in doubt you can use a USB as the boot volume without affecting existing files; with a large enough partition (the core OS is about 1GB) you can save settings – including the customisation we’re about to apply!

*We’re installing the 15.04 version, not the newest release, as 15.04 is an LTS (long-term stable) Ubuntu distribution. This means it’s supported officially for a good few years yet. [Edit: Arkash (see below) kindly pointed out that 14.04 is the most recent LTS, not 15.04. My only real reason for using 15.04 therefore is ‘I quite like it and most of the bugs have gone'(!)]

2. The appearance

The MATE windowing system is very slick, but the green-ness is a bit, well, icky. We’re going to download a few appearance mods (themes, in Ubuntu parlance) which will improve things a bit. You’ll need to download these to your boot/install USB:

Boot the OS

Now that we’ve got everything we need, let’s boot up the OS. Insert the USB stick into your Mac-envious victim of choice, power it up and enter the BIOS menu (F12 in most cases) before the existing OS loads. Select the USB drive as the boot volume and continue.

Once the Ubuntu MATE session loads, you’ll have the option of trialling the OS from the live USB, or permanently installing it to a hard drive. For this computer I won’t be installing to a hard drive (long story) but using the USB, so customising that. Pick either option, but beware that customisations to the live USB OS will be lost should you later choose to install to a hard drive.

When you’re logged in, it’s time to smarten this baby up! First we’ll play with the dock a bit. From the top menu bar, select “System > Preferences > MATE Tweak” to open the windowing management tool. In the ‘Interface’ menu, change Panel Layouts to ‘Eleven’ and Icon Size to ‘Small’. In the ‘Windows’ menu, we’ll change Buttons Layout to ‘Contemporary (Left)’. Close the MATE Tweak window to save. This is already looking more Mac-y, with a dock area at the bottom of the screen, although the colours and icons are off.

Now we’ll apply some theme magic to fix that. Select “System > Preferences > Look and Feel > Appearance”. Now we can customise the appearance. Firstly, we’ll load both the ‘Ultra-Flat Yosemite Light’ and ‘OSX-MATE’ themes, so they’re available to our hybrid theme. Click the ‘Install..’ icon at the bottom of the theme selector, you’ll be able to select and install the Ultra-Flat Yosemite Light theme we downloaded above. It should unpack from the .zip archive and appear in the themes panel. Installing the OXS-MATE theme is slightly trickier:

  • Unzip (as sudo) the OSX-MATE theme to /usr/share/themes
  • Rename it from OSX-MATE-master to OSX-MATE if you downloaded it from git as a whole repository (again, you’ll need to sudo)
  • Restart the appearances panel and it should now appear in the themes panel.

We’ll create a new custom theme with the best bits from both themes, so click ‘Custom’ theme, then ‘Customise..’ to make a new one. Before you go any further, save it under a new name! Now we’ll apply changes to this theme. There are five customisations we can apply: Controls, Colours, Window Border, Icons and Pointer:

  • ControlsUltra-Flat Yosemite Light
  • Colours: There are eight colours to set here. Click each colour box then in the ‘Colour name’ field, enter:
    • Windows (foreground): #F0EAE7 / (background): #0F0F0E
    • Input boxes (fg): #FFFFFF / (bg): #0F0F0E
    • Selected items (fg): #003BFF / (bg): #F9F9F9
    • Tooltips: (fg): #2D2D2D / (bg): #DEDEDE
  • Window borderOSX-MATE
  • IconsFog
  • PointerDMZ (Black)

Save the theme again, and we’re done! Exit Appearance Preferences.

Finally we’ll install Solarized as the default terminal (command-line interface) theme, because I like it. In the MATE Terminal, Unzip the solarized-mate-terminal archive, as sudo. Enter the directory and simply run (as sudo) the install script using bash:


$ sudo unzip solarized-mate-terminal
$ cd solarized-mate-terminal
$ bash solarized-mate.sh

Close and restart the terminal. Hey presto! You should now be able to see the light/dark Solarized themes available, under ‘Edit > Profiles’. You’ll want to set one as the default when opening a new terminal.

Finally…

Later, I also installed Topmenu, a launchpad applet that gives an OSX-style top-anchored application menu to some linux programs. It’s a bit cranky and fiddly though, so you might want to give it a miss. But if you have time on your hands and really need that Cupertino flash, be my guest. I hope you’ve had a relatively easy install for the rest of this post, and if you’ve got any improvements, please let me know!

Happy Tweaking…

Posted in Blog, Coding, Science | Tagged , , , , , | 5 Comments

Messing about with the MinION

IMG_4543Molecular phylogenetics – uncovering the history of evolution using signals in organisms’ genetic sequences – is a powerful science, the latest expression of the human desire to understand our common origins. But for all its achievements, I’d always felt something was, well lacking from my science. This week we’ve been experimenting with the MinION for the first time, and now I know what that is: immediacy. I’ll return to this theme later to explain how exciting this realisation is, but first we better ask:

What is a MinION?

NGS nerds know about this already, of course, but for the rest of us: The MinION (minh-aye-on) is a USB-connected device marketed by a UK company, Oxford Nanopore, as ‘a portable real-time biological analyser’. Yes – that does sound a lot like a tricorder, and for good reason: just like the fictional device, it promises to make the instant identification of biological samples a reality. It does this using a radically different new way to ‘read’ DNA sequences from a liquid into a letters (‘A, C, G, T‘) on a computer screen. Essentially, individual strands of DNA are pulled through a hole (a ‘nanopore’) in an artificial membrane, like the membranes that surround every living cell. When an electric field is applied to the membrane, the individual DNA letters (actually, ‘hexamers’ – 6-letter chunks) passing through the nanopore can be directly detected as fluctuations in the field, much like a magnetic C-60 cassette tape is read by a magnetic tape head. The really important thing is that, whereas other existing DNA-reading (‘sequencing’) machines take days-to-weeks to read a sample, the MinION produces output in minutes or even seconds. It’s also (as you can see in the picture, above) a small, wait tiny device.

Real-time

This combination of fast results and small size makes the tricorder dream possible, but this is more than just a gadget. Really, having access to biological sequences at our fingertips will completely change biology and society in many ways, some of which Yaniv Erlich explored in a recent paper. In particular, molecular evolutionary biologists will soon be able to interact with their subject matter in ways that other scientists have long taken for granted. What I mean by this is that in many other empirical disciplines, researchers (and schoolkids!) are able to directly observe, or easily measure, the phenomena they study. Paleontologists dig bones. Seismologists feel earthquakes. Zoologists track lions. And so on. But until now, the world of genomic data hasn’t been directly observable. In fact reading DNA sequences is a slow, expensive pain in the arse. And in such a heavily empirical subject, I can’t help but feel this is a hindrance – we are burdened with a galaxy of baroque models to explain variations in the small number of observations we’ve made, and I’ve got a hunch more data would actually consign many of them to the dustbin.

Imagine formulating, testing, and validating or discarding genetic hypotheses as seamlessly as an ecologist might survey a new forest…

IMG_4531

Actual use

That’s the spiel, anyway. This week we actually used the MinION for the first time to sequence DNA (I’d been running some other technical tests for a couple of months, but this was our first attempt with real samples) and since it seems lots more colleagues want to ask about this device than have had a chance to use it, I thought I’d share our experiences. I say ‘we’ – work took place thanks to a Pilot Study Fund grant at the Royal Botanic Gardens, Kew, in collaboration with Dr. Alex Papadopulos (and really useful input from Drs. Andrew Helmstetter, Pepijn Kooij and Bryn Dentinger). It’s worth pointing out that although there’s a lot of hype about the device, it is still technically in a prototype/public-beta type phase, so things are changing all the time in terms of performance, etc.

IMG_4547First up, the size. The pictures don’t really do it justice… compared with existing machines (fridge-sized, often) the MinION isn’t small, it’s tiny. The thing itself, plus box and cable, would easily fit into a side pocket on any laptop bag. So the ‘portable’ bit is certainly true, as far as the platform itself is concerned. But there’s a catch – to prepare a biological sample for sequencing on the MinION, you first have to go through a fairly complicated set of lab steps. Known collectively as ‘library preparation’ (a library here meaning a test tube containing a set of DNA molecules that have been specially treated to prepare them to be sequenced), the existing lab protocol took us several hours the first time. Partly that was due to the sheer number of curious onlookers, but partly because some of the requisite steps (bead cleanups etc) just need time and concentration to get right. None of the steps is particularly complicated (I’m crap at labwork and just about followed along) but there’s quite a few, and you have to follow the steps in the protocol carefully.

Performance

So how did the MinION do? We prepared two libraries; one from a control sample of bacteriophage lambda (a viral genome, used to check the lab steps are working) and another from Silene latifolia, a small flowering plant and one of Alex’s faves. The results were exciting. In fact they were nerve-wracking – initially we mixed up the two samples by mistake (incredible – what pros…) and were really worried when, after a few hours’ sequencing, not a single DNA read matching the lambda genome had appeared. After a lot of worrying, and restarting various analyses, including the MinION itself, we eventually realised our mistake, reloaded the MinION with the other sample and – hey presto! – lambda reads started to pour out of the software. In the end, we were able to get a 500x coverage really quickly (see reads mapped to the reference genome, below):

lambda

You can see from the top plot that we got good even coverage across the genome, while the bottom plot shows an even (ish) read length distribution, peaking around 4kb. We’d tried for a target size of 6kb (shearing using g-TUBEs), so it seems the actual output read size distribution is lower than the shearing target – you’d need to aim higher to compensate. Still, we got plenty of reads longer than 20, or even 30,000 base-pairs (bp) – from a 47kbp genome this is great, and much much much longer than typical Illumina paired-end reads of ~hundreds of bp.

Later on the next day, we decided we’d done enough to be sure the library preparation and sequencing were working well, so we switched from the lambda control to our experimental (S.latifolia) sample. Again, we got a good steady stream of reads, and some were really long, over 65,000bp. Crucially, we were able to BLAST these against NCBI and get hits against the NCBI public sequence database and get Silene hits straight back. We were also able to map them onto the Silene genome directly using BWA, even with no trimming or masking low-quality regions. Overall, we got nearly a full week’s sequencing from our flowcell, with ~24,000 reads at a mean length of ~4kb. Not bad, and we’d have got more if we hadn’t wasted the best part of the run sodding about while we troubleshot our library mix-up at the start (the number of active pores, and hence sequencing throughput, declines over time).

The bad: There is a price for these long, rapid reads. Firstly, the accuracy is definitely lower than Illumina – although the average lambda library sequencing accuracy in our first library was nearly 90%, on some of our mapped Silene reads it dropped much lower – 70% or so. This has been well documented, of course. Secondly, the *big* difference between the MinION in use and a MiSeq or HiSeq is the sheer level of involvement needed to get the best from the device – whereas both Illumina machines are essentially push-button operated, the MinION seems to respond well if you treat it kindly (reloading etc) and not if you don’t (introducing air bubbles while loading seemed to wreck some pores)

The very, very, good: Ultimately though, we came away convinced that these issues won’t matter at all in the long run. ONT have said that the library prep and accuracy are both boing improved, as are flowcell quality (prototype, remember). Most importantly the MinION isn’t really a direct competitor to the HiSeq. It’s a completely different instrument. Yes, they both read DNA sequences, but that’s where the similarity ends. The MinION is just so flexible, there’s almost an infinite variety of uses.

In particular, we were really struck that the length of the reads means simple algorithms like BLAST can get a really good match from any sample with just a few hundreds, or tens, of reads from a sample. You just don’t need millions of 150bp reads to match an unknown sample to a database with reads this long! Coupled with the fact that the software is real-time and flowcells can be stopped and started, and you have the bones of a really capable genetic identification system for all sorts of uses; disease outbreaks (in fact see this great work on Ebola using MinION); customs control of endangered species; agriculture; brewing – virtually anything, in fact, where finding out about living organisms’ identity or function is needed.

The future.

There’s absolutely no question at all: really, these devices are the future of biology. Maybe the MinION will take off (right now, the buzz couldn’t be hotter), or a competitor will find a way to do even more amazing things. It doesn’t matter how it comes about, though – the next generation of biologists really will be living in a world where, in 10, 15, 20 years, at most, the sheer ubiquity of sequencers like this will mean that most, if not all eukaryotes’ genomes will have been sequenced, and so can be easily matched against an unknown sample. If that sounds fantastic, consider that the cumulative sequencing output of the entire world in 1988 amounted to a little over 100,000 sequences, a yield equivalent to a single good MinION run. And while most people in 1988 thought that, since the human genome might take 30 or 40 years to sequence, there was little point in even starting, a few others looked around at the new technology, its potential, and drew a different conclusion. The rest is history…

Posted in Science | Tagged , , , | 1 Comment