Poster Presentation 41st Lorne Genome Conference 2020

Barcoding and demultiplexing Oxford Nanopore native RNA sequencing reads with deep residual learning (#132)

Martin A Smith 1 2 3 4 , Tansel Ersavas 1 , James M Ferguson 1 , Huanle Liu 1 5 , Morgan C Lucas 1 5 6 , Oguzhan Begik 1 4 5 , Lilly Bojarski1 1 , Kirston Barton 1 4 , Eva Maria Novoa 1 4 5 6
  1. Garvan Institute of Medical Research, Darlinghurst, NSW, Australia
  2. CHU Sainte-Justine Research Centre, Montreal, Canada
  3. Department of Biochemistry and Molecular Medicine, University of Montreal, Montreal, Canada
  4. St-Vincent’s Clinical School, UNSW Sydney, Darlinghurst, NSW, Australia
  5. Centre for Genomic Regulation (CRG), The Barcelona Institute of Science and Technology, Barcelona, Spain
  6. Universitat Pompeu Fabra (UPF), Barcelona, Spain

Nanopore sequencing has enabled high-throughput sequencing of native RNA molecules without conversion to cDNA, thus opening the gates to a new era for the unbiased study of RNA biology. However, the error rate introduced by base calling raw nanopore signal complicates sequence-based analytics premised on accuracy, including barcode demultiplexing. Furthermore, there currently lacks a formal barcoding protocol for direct sequencing of native RNA molecules, limiting the applicability of direct RNA sequencing to those scenarios where the amount of RNA available is low, such as in the case of patient-derived RNA samples. We describe a novel strategy to barcode and demultiplex direct RNA sequencing, involving custom DNA oligonucleotides ligated to RNA transcripts during library preparation. The raw signal associated with the DNA barcode is extracted, transformed into an array of pixels, and demultiplexed using a deep convolutional neural network classifier. Our method, DeePlexiCon, implements a 20-layer residual neural network model that can demultiplex 60% of reads with 99.9% specificity, or 93% of reads with 95.1% specificity. The availability of an efficient and simple barcoding strategy for native RNA sequencing will enhance the use of direct RNA sequencing by making it more cost effective to the entire community.