Discovering the world of the Long-read nanopore Sequencing

Posted on 12th December, 2019 by Carlo Pecoraro

Nanopore sequencing technology is an emerging and a promising sequencing technology, which promises high sequencing throughput, low cost and longer read length. Despite all these advantageous characteristics, nanopore sequencing has one major drawback: high error rates!

Here we have the possibility to discuss about this technology with the instructors Robert Vaser and Josip Marić of the Physalia course“GENOME ASSEMBLY USING OXFORD NANOPORE SEQUENCING”

Hi Robert and Josip, what are your current research interests?

R: I am currently working on algorithms for de novo genome assembly from long uncorrected reads. My interests are tied to the overlap-layout-consensus paradigm, and I am focusing on assembly graph simplification of the layout step and increasing the accuracy in the consensus step with fast multiple sequence alignment.

J: My main focus of research is analysis of long RNA reads produced by third generation sequencers. I work on building new splice aware tools and methods for mapping RNA reads to the reference genome. Since recently I also started researching metagenomics with emphasis on tools that identify species and their abundances in metagenomic samples.

--

When did you start analysing Nanopore data?

R: I started toying with Oxford Nanopore data in my first PhD year somewhere at the beginning of 2016, when I was getting to know the technology with the help of an Escherichia coli dataset from Loman Labs. At that same time my colleagues tried to assemble genomes without prior error correction of reads.

J: My first encounter with Nanopore data happened in 2015 while I was working on my master thesis, where I built my first RNA splice aware mapper. I have started working with Nanopore data on a daily basis when I started my PhD in December, two years ago.

--

Which is the main advantage of choosing this technology?

R: The low cost and small size of the MinION sequencer make it a portable solution for various field analyses. In addition, the Oxford Nanopore technology has the highest potential to bridge even the longest repetitive genomic regions, which greatly mitigates the assembly problem.

J: The main advantage of Nanopore technology is the length of the reads it produces. This advantage aids in both genome assembly and transcriptome analysis, where they can span either repetitive regions or several exons, alleviating the assembly problem and enabling the detection of different isoform, respectively.

--

Which are the main steps of a typical pipeline to analyse these data? And in which step is your work mostly focus on?

R: The electrical signals from the sequencer have to be transformed into sequences of nucleotides. Afterwards, the reads are assembled and the obtained contigs are polished either at base level or signal level, or even both. In addition, the assembly is sometimes scaffolded with other technologies such as Bionano and Hi-C. For diploid genomes some pipelines try to resolve haplotypes. Various downstream biological analyses that follow after are not my bread and butter as I am a computer scientist. My work is focused on the assembly part of the pipeline.

J: In the assembly part of the pipeline DNA reads are first assembled and produced contigs are then polished in several steps. The assemblers usually work with sequences of nucleotides so before their assembly the electrical signals produced by sequencers need to be transformed into nucleotides. The assembled genomes are later used in different analyses where other bioinformatics tools can be used such as RNA mappers, metagenome classifiers, gene predictors, etc. I mostly work on tools such as mappers and classifiers, those that use previously assembled genomes.

What have been the main challenges in analysing these data, and how have you approached them?

R: The main concern with Oxford Nanopore technology was the enormous error rate in the beginning. This was later significantly decreased, although more sensitive algorithms had to be developed. As we wanted to assemble genomes directly from erroneous reads, we focused on developing a consensus tool which was later published as Racon.

J: The main challenge in analysing Nanopore data has been its high error rates. In RNA mapping tools this is quite overwhelming since we want mappers to produce fully, up to a single base, correct alignments. We had to develop many highly sensitive methods that can work with spliced reads with high error rates and produce fully correct alignments.

Josip Marić Robert Vaser

Laboratory for Bioinformatics and Computational Biology

University of Zagreb, Croatia

--

Can Nanopore technology be used for the assembly of eukaryotic genomes?

R: Absolutely. There is even a variety of different assemblers that were successful at doing this. The technology can also be used for scaffolding existing short read assemblies.

J: Yes, of course, there are many tools that do exactly that.

--

What are the main similarities and differences between Nanopore and Pacbio technologies?

R: Both technologies produce much longer reads when compared to the second generation of sequencing, but at a cost of higher error rates. With the latest protocols of both companies we can differentiate them a bit more. The HiFi reads of PacBio are characterized with high accuracy while the ultra-long protocol of Oxford Nanopore focuses on greater read yield with lengths of hundreds of thousands base pairs.

J: Both technologies produce reads with greater length (several thousands of bases) when compared to reads produced by second generation sequencing tools (hundred bases), but in recent time PacBio has been producing reads with lower average error rates of two percent, while Oxford Nanopore has been focusing on much longer reads being several hundreds of thousands bases long.

--

What’s next for your research?

R: I am trying to boost the performance of our de novo assembler Raven by decreasing the execution time needed for the overlap step while maintaining the same output. Afterwards, I will try boosting the accuracy of Racon.

J: As mentioned in the first question, my research will be more focused on metagenomics and building metagenomic classifiers, but I will continue to develop and maintain the previously developed RNA splice aware mapper. I hope to build a tool that will be fast in identifying species and strains in the environmental samples.

Thanks Robert and Josip for your time. See you in Berlin!!