Category Archives: Publications

Rapid, raw-read reference and identification (R4IDs): A flexible platform for rapid generic species ID using long-read sequencing technology.

Our paper on rapid identification of samples using partial, low-coverage, MinION-sequenced reference databases for ID (at the Kew Science Festival) is in preprint. See here on BiorXiv: doi: 10.1101/281048.

In it, we show (with empirical data and simulation) that the length and bias of MinION reads makes them ideal for sample ID – better than NGS, under certain conditions – even where no reference assembly is available and a genomic skim is used to BLAST against, instead. Because these genome-skim-DBs are quick to generate, we call them ‘rapid-raw-reaad-reference fo ID’, or ‘R4IDs’ for short.

The code to repeat these analyses or set up a R4IDs analysis is on GitHub but I’ve also packaged this as Docker containers: hub.docker.com/lonelyjoeparker

A few rough corners to sand while we decide where to submit it, but comments welcome in the meantime. Thanks, as ever to my colleagues (long-suffering Alex and Andrew) for all their stress:


Rapid, raw-read reference and identification (R4IDs): A flexible platform for rapid generic species ID using long-read sequencing technology.

Joe Parker* Andrew J. HelmstetterAlexander S. T. Papadopulos*

Abstract

The versatility of the current DNA sequencing platforms and the development of portable, nanopore sequencers means that it has never been easier to collect genetic data for unknown sample ID. DNA barcoding and meta-barcoding have become increasingly popular and barcode databases continue to grow at an impressive rate. However, the number of canonical genome assemblies (reference or draft) that are publicly available is relatively tiny, hindering the more widespread use of genome scale DNA sequencing technology for accurate species identification and discovery. Here, we show that rapid raw-read reference datasets, or R4IDs for short, generated in a matter of hours on the Oxford Nanopore MinION, can bridge this gap and accelerate the generation of useable reference sequence data. By exploiting the long read length of this technology, shotgun genomic sequencing of a small portion of an organism’s genome can act as a suitable reference database despite the low sequencing coverage. These R4IDs can then be used for accurate species identification with minimal amounts of re-sequencing effort (<1000s of reads). We demonstrated the capabilities of this approach with six vascular plant species for which we created R4IDs in the laboratory and then re-sequenced, live at the Kew Science Festival 2016. We further validated our method using simulations to determine the broader applicability of the approach. Our data analysis pipeline has been made available as a Dockerised workflow for simple, scalable deployment for a range of uses.

Field-based, real-time metagenomics and phylogenomics for responsive pathogen detection: lessons from nanopore analyses of Acute Oak Decline (AOD) sites in the UK.

Talk presented at the UK-India Joint Bioinformatics Workshop, Pirbright Institute, 09 Feb 2018

[slideshare id=88051198&doc=joe-parker-pirbright-ukindia-180215141918]

Abstract:

In a globalised world of increasing trade, novel threats to animal and plant health, as well as human diseases, can cross political and geographical borders spontaneously and rapidly. One such example is the rise of Acute Oak Decline (AOD) in the UK, a multifactorial decline syndrome with uncertain aetiology, vectors, and host risk factors first reported in the UK a decade ago. Affected oaks display significant morbidity and mortality, with symptoms including vascular interruption, crown loss and characteristic striking bark lesions breaching cambium and filled with a viscous, aromatic, dark-brown/black exudate, which may sometimes be released under considerable pressure. Although multiple bacterial species have been associated to lesion sites in affected oaks, and a putative insect vector identified, the basic risk factors, transmission, progression and treatment of the syndrome remain unclear.

This dispiriting state of affairs presents an ideal opportunity to exploit recent developments in nanopore sequencing to develop and test field-based methods of real-time phylogenomics and metagenomics to establish baseline data for healthy oaks, and contrast these with affected / dying oaks to shed light on syndrome causes and management. WGS metagenomic sampling was carried out on leaf and bark tissue from 37 affected, asymptomatic, and recovering individuals (nine Quercus species) at three field sites over a year. Extraction and DNA sequencing were performed in the field for a subset of samples with MinION nanopore rapid sequencing kits, and also using MinION and paired-end Illumina sequencing under laboratory conditions. Metagenomic analyses to determine microbial community composition were carried out, and real-time phylogenomic methods were also developed and applied. Early results from these analyses and lessons for future work are presented.

Metagenomic datasets can be rapidly generated in the field with minimal equipment using nanopore sequencing, providing a responsive capability for emerging disease threats and reducing transmission risks associated with transporting quantities of potentially infectious samples from outbreaks of novel diseases. Furthermore, real-time data analysis can provide rapid feedback to field teams, both to inform management decisions and also to allow for adaptive experimental protocols that dynamically target data collection to extract maximum information per unit effort.

Real-time phylogenomics or ‘Some interesting problems in genomic big data’

Talk given at a technology/informatics company, London, Feb 2018.

[slideshare id=87391225&doc=joe-parker-reak-time-phylogenomics-180207132740]

An overview of contemporary advances and remaining problems in big-data biology, especially phylogenomics.

Read all about it!

Dead excited to say our Nature Science Reports paper on field-based DNA extraction, sequencing (and a bit of analysis) has been picked up by the BBC World Service and The Times (UK) newspaper! You can read all about it here (paywall).

If you can’t read it online, my Grandad has a copy he might lend you. We’re proper scientists now…

Inference and informatics in a ‘sequenced’ world

Short lecture relating my recent work on real-time phylogenomics, implications for bioinformatics research and future directions of genomic/phylogenetic modelling to explicitly account for phylogeny, synteny and identity through coloured graphs.

University of Reading, 2nd August 2017

Slides [SlideShare]: cc-by-nd

[slideshare id=78587606&doc=2017readingbioinfforgenomics-joeparker-final3-170805084405]

Using field-based DNA sequencing to accelerate phylogenomics

Invited seminar at the Department of Zoology, Oxford University, 30th November 2016.

Summary of our field-based real-time phylogenomics (MinION DNA sequencing) experiments this year, and applicability to broad-scale tree-of-life phylogenomics and macroevolutionary biology.

Slides [SlideShare]: cc-by-nd

[slideshare id=69767351&doc=2016oxfordzoojoeparker-161202163931]

Single-molecule real-time (SMRT) Nanopore sequencing for Plant Pathology applications

A short presentation to the British Society for Plant Pathology’s ‘Grand Challenges in Plant Pathology’ workshop on the uses of real-time DNA/RNA sequencing technology for plant health applications.

Doctoral Training Centre, University of Oxford, 14th September 2016.

Slides [SlideShare]: cc-by-nc-nd

[slideshare id=66051562&doc=smrt-nanopore-gcpp-joeparker-160915100855]

Real-time Phylogenomics

General science talk about the potential of real-time phylogenomics, delivered at the Jodrell Lecture Theatre, Kew Gardens, November 2nd 2015

Slides [SlideShare]: cc-by-nc-nd

[slideshare id=54651010&doc=real-time-phylogenomics-joeparker-151102162613-lva1-app6892]

Omics in extreme Environments (Lightweight bioinformatics)

Presentation on lightweight bioinformatics (Raspi / cloud computing) for real-time field-based analyses.

Presented at iEOS2015, St. Andrews, 3-6th July 2015.

Slides [SlideShare]: cc-by-nc-nd

[slideshare id=50251856&doc=joeparkerlightweightbioinformatics-150707112254-lva1-app6892]