Real-time phylogenomics

Collection, sequencing and analysis of DNA in the field.

Introduction

This is a tale of two small flowers. Arabidopsis thaliana and Arabidopsis lyrata ssp. petraea are small, flowering plants from the same genus. In many respects they're hard to tell apart - appearence, flowering times, location... like many, many other plant and fungal species across the world which need rapid, accurate identification to fight wildlife crime, discover new drugs and fight crop diseases.

It's not always easy to work out which species a plant belongs to just by looking at it. In fact few people could correctly identify all the species in their own gardens!

Being able to quickly identify species is critical for scientific research, conservation of biodiversity and fighting wildlife crime, for example. By sequencing an individuals genome (i.e., reading their DNA sequence) it's possible to determine which species it belongs.

Until now sequencing whole genomes has been slow, expensive and needed a laboratory with lots of specialised equipment.

In this film a team of scientists from the Royal Botanic Gardens, Kew travelled to Snowdonia National Park to try and sequence plant genomes in a tent. They took a MinION DNA sequencer which fits in a pocket and is made by UK-based Oxford Nanopore Technologies. To our knowledge this was the first time genomic sequencing of higher eukaryotes has been performed in the field.

Can the team, fighting against the weather, use minimal equipment and this exciting new technology to identify plants from their DNA alone?

Questions

Research at the interface of evolution, informatics, and ecology

Why Sequence Genomes In The Field?

Genetic analyses have been carried out in specialist laboratories since Watson & Crick's day. Now the Oxford Nanopore MinION sequencer - a tiny, USB-powered device the size of a phone, not a fridge - could replace 70 years of lab techniques. Why move them into the field, where conditions are less predictable and hot tea less certain? Here's a few good reasons:

Real-time phylogenomics

Rapid - even instant - analyses of genetic material could overturn medicine, rare species discovery and export control. The coming ubiquity of both portable DNA sequencers and cloud computation mean scenarios formerly found in sci-fi films (instant DNA analysis) are coming, soon. We're developing methods to streamline DNA sequence analysis using cloud computation and other novel techniques.
View details »

Field-based sequencing

Often, waiting to get the sample from the field to the lab isn't practical. Some specimens decay too quickly, or can't be moved for legal reasons. DNA-based evidence might be needed on-site, to direct further research or arrest or release poachers. Finally, the best argument for developing portable sequencers is that more people, from more countries, can do more sequencing...
View details »

Genomics at scale

The next wave of DNA sequencers, like the MinION, will be highly portable, compared to lab-bound models of the last decade. Even if we only attained comparable throughputs to current devices - certain to improve - the total volume of data generated will explode as the number of experimenters with easy access to sequencing leaps from thousands to millions; spanning academics, rangers, students, farmers, businesses and citizen scientists. Absorbing this data and putting it to work will be the defining challenge of the next decade in biology.
View details »


Can It Work?

If it's now theoretically possible to read DNA genomes in the field, how well would it actually work? We wanted to find out, in particular the answers to three key questions:

Taking the lab into the field

A normal lab has thousands of pounds-worth of sophisticated machines, calibrated and installed in climate-controlled conditions and with plentiful power and chemicals. What happens when you try to shrink that into a tent? What can you omit, what compromises do you have to make?
View details »

Sequencing performance

There's a big difference between a toy and a workhorse. How, in detail, would these sequencers perform compared to lab-based alternatives? We collected operational and performance data on both field-based MinION sequencers running R7.3 and brand-new R9 chemisty, versus MiSeq, using replicate split aliquots of field-collected and field-extracted DNA.
View details »

Stats of field-based sample ID

Just collecting DNA letters on a screen wasn't good enough. We sequenced congeneric (closely-related) species with completed reference genomes to allow us to investigate the statistical performance of the field-based sequence ID in detail: true/false positive rates, power, etc.
View details »

Academic Outputs

Publications, manuscripts and talks

Publications

See Google Scholar for the most recent...

We've just released our manuscript to bioarXiv:

Field-based species identification in eukaryotes using single molecule, real-time sequencing.

Joe Parker, Andrew J. Helmstetter, Dion Devey & Alexander S.T. Papadopulos

Acknowledgements

This work was funded by a Pilot Study Grant to JDP and a Howard Lloyd Davies legacy grant to ASTP. The authors also thank The Botanical Society of Britain & Ireland. Natural Resources Wales, Tim Wilkinson, Robyn Cowan and Patricia and David Brandwood for assistance with fieldwork, mapping, labwork, and vehicles respectively, and the Landowner of the Moelwyn Mawr SSSI for permission to carry out fieldwork. JDP’s post-hoc analyses and manuscript preparation were part-funded under RBG Kew’s ‘Plant And Fungal Trees Of Life’ initiative supported by the Calleva Foundation Phylogenomic Research Programme and the Sackler Trust.

Author information

Oxford Nanopore Technologies provided free reagents and consumables to this study, as well as technical advice. JDP and ASTP received travel remuneration and free tickets to present an early version of this work at a conference (London Calling 2016). Basecalled read data for Illumina and Oxford Nanopore sequencing runs are available via the EBI ENA at XXX. Scripts for downstream analyses and a list of required software are available at http://github.com/lonelyjoeparker/XXXX. For more details or assistance please contact JDP via email joe.parker@kew.org or ASTP a.papadopulos@kew.org.

Author contributions

ASTP and JDP conceived the study and obtained funding. DD, ASTP and JDP designed and conducted fieldwork. ASTP designed and conducted field-based labwork with input from JDP, AH and DD. AH conducted lab-based sequencing. JDP conducted bioinformatics and phylogenomic analyses with contributions from AH. ASTP and JDP prepared the manuscript with contributions from DD and AH.

Code

Phylogenetics and bioinformatics software (with just a smattering of web bits)

Scripts on Github

Pipelines and analyses for this project are documented & versioned on GitHub.


Software dependencies

Sequencing / basecalling

Data wrangling / mapping / ID

Genomics


In development

Datasets

Data and scripts for reproducible science

Datasets

As much of our work as possible, including data, will soon be available in the public realm. We're a public-funded scientists and advocates of Open Data and Reproducible Research. If you want workflow scripts and software please email and we'll try to help where we can.

AWS buckets

Raw sequencing data from Nanopore and MiSeq runs and genome alignments are currently being uploaded to Amazon S3 buckets, though some will be embargoed ahead of publication. They will be available shortly.

Machine images

At present one machine image is available, from the pilot 'lightweight bioinformatics' project. Search AWS AMIs for 'ami-90296be7'.

Credits

Funding, collaborators & students, employers and contact info.

Funding


Employer

We are currently employed in the Biodiversity Informatics & Spatial Analysis, and Conservation Biology departments of the Science Directorate at the Royal Botanic Gardens, Kew in London.


Collaborators

Current collaborators include:


Contacts

Dr. Joe Parker
Biodiversity Informatics & Spatial Analysis
The Jodrell Laboratory
Royal Botanic Gardens, Kew
Surrey, UK TW9 3AB
p: +44(0)20 8332 5063
e: joe.parker@kew.org
P: _WCn7AYAAAAJ
O: 0000-0003-3777-2269
g: @lonelyjoeparker
T: @lonelyjoeparker
Dr. Alexander S.T. Papadopulos
Conservation Genetics
The Jodrell Laboratory
Royal Botanic Gardens, Kew
Surrey, UK TW9 3AB
p: +44(0)20 8332 5800
e: a.papadopulos@kew.org
P: reouVgMAAAAJ
T: @metallophyte

About this site

We're using a lot of standard templates (layouts in Bootstrap, fonts from Typekit.)

So the site you see is generated simply from a (largely) static HTML file with a couple of bits of PHP pulling in Wordpress posts to populate the blog and publications. Parallax effects use Aen Tan's Parallax-Scroll code and I figured out the integration with Bootstrap (actually pretty simple) with a lot of help from this tutorial. The whole site probably took less than 20 hours to put together including design, parsing WP, and deployment and testing - I think that's pretty good, on balance.

Social and media

Vids, pics, blogs and links

Media

Media

A short film by Kew Gardens, featuring the music of the Moulettes POV footage of a real experiment taking place in a lot of rain.

Social

Tag this project with #realtimephylogenomics! More about us, too - members:

Supporters and funders:

Links


See more in the blog...

Team

About us

This project is the brainchild of Joe Parker and Alex Papadopulos, and carried out with support from colleagues at the Royal Botanic Gardens, Kew:

Dr. Joe Parker

I currently hold an Early-Career Research Fellowship in Phylogenomics at the Royal Botanic Gardens, Kew. This post allows me wide freedom pursue my research into real-time phylogenomics, field-based DNA sequencing for turbotaxonomy and metagenomics, and integrated alignment & phylogeny models of molecular evolution.

I also lead the Informatics workpackage for the Plant & Fungal Trees Of Life project, a Kew-led initiative to reconstruct a genus-level supertree for 80% of all plant and fungal genera by 2020.


@lonelyjoeparker

Dr. Alexander S. T. Papadopulos

My role is to use genetic techniques to understand and conserve plant diversity. My research combines population genetics/genomics with ecological analyses and experiments to study adaptation and speciation. I am interested in how organisms adapt to their environment, the influence this can have on the evolution of reproductive isolation and the extent to which this can be a predictable and repeatable process. I am particularly interested in the evolution and conservation of island plants and divergence in the face of continuing gene flow.


@metallophyte

Dr. Andrew Helmstetter

I use a variety of approaches and study groups to understand the evolutionary origins of diversity. My research typically revolves around phylogenetics, population genetics and genomics but also delves into modelling species distributions and examining patterns of growth and development.

Adaptation and Speciation Genetics in Silene

The sea campion, Silene uniflora is a small flowering plant that is found on coastlines throughout the UK. According to folklore, sea campion should never be picked for fear of tempting death.


@ajhelmstetter

Dr. Dion Devey

I am responsible for managing the RBG Kew molecular biology and genetics laboratory facilities. We undertake a range of highly-specialised techniques, including DNA extraction, plant and fungal Sanger sequencing, next generation sequencing (MiSeq) and population-genetics (DNA micro-satellites and Amplified Fragment Length Polymorphisms). We train and supervise staff, students and visiting researchers, and support Royal Botanic Gardens Kew science projects, by providing experimental design, project management, troubleshooting and costing for grant submissions. I also work closely with the Kew Innovation Unit to undertake ethical commercial activities and am leading the development of novel techniques to further expand Kew’s fundraising capabilities.