CUrriculum

Day 1. 2-8 pm Berlin time

Session 1: Introduction, basecalling and demultiplexing

We will start with a general introduction to sequencing and assembly using Oxford Nanopore Technologies sequencers. Participants will be introduced to the theory of how Nanopore sequencing works and how raw Nanopore sequencing data is formatted.

Practical sessions will include using current gold standard basecalling tools to transform raw Nanopore signal data into DNA sequence (including an example of multiplexing/demultiplexing in Nanopore sequencing), and learning how to quality control and filter your basecalled sequencing reads.

Day 2. 2-8 pm Berlin time

Session 2: Benefits and Limitations of Nanopore

In this session we will talk about the benefits and limitations of using Nanopore for genome assembly and discuss the differences between Nanopore and other sequencing technologies.

Session 3: Genomes, assemble!

Although Nanopore sequencing is capable of generating individual reads which are long enough to span entire viral (and even some bacterial) genomes by themselves, genome assembly will almost always still require shorter genome fragments to be stitched together. Many assembly tools have been developed or optimised for long reads generally, or for Nanopore reads specifically. Some assembly tools might be better for your dataset than others, depending on the type of genome you have sequenced, the overall quality you require, your available computing power, and how long you are willing to wait for your assembly.

Day 3. 2-8 pm Berlin time

Session 4: Get polishing

Some assembly tools produce contigs with better accuracy than others, due to which algorithm they have used for assembly, and whether or not they include sequencing read correction steps. However, all newly-assembled genomes will usually benefit from some additional polishing. Here, participants will be introduced to polishing tools which use a variety of methods to compare the sequencing signal or raw reads back to the assembled contigs and make any possible corrections. We will also demonstrate how to use short read data to polish your assembled genomes and improve their accuracy even further.

Session 5: How good is my genome?

The accuracy of our assembled contigs is important for most downstream applications, including annotation and analysis of structural variants. Here, we will discuss methods for deciding if your genome assembly is accurate enough to use, whether it contains misassemblies, and whether it is “complete” (and what to do if it is not as good as you might have hoped…).

Day 4. 2-8 pm Berlin time

Session 6: How good is my genome? – Practical

Here we will apply some of what we learned yesterday to assess the quality of our newly assembled genomes.

Session 7: Which assembly method is best for my data?

During session 2, we briefly discussed that different tools are optimal for different types of data. During this session, we will further consider the different assembly methods available for different sample types. Which assembly tool is best for a haploid genome? What about diploid, or even polyploid? How can you assemble genomes for a variety of different microbes found in the same metagenomic sample? By the end of this session, you will be armed with the knowledge to decide!

This session will also include a practical demonstration of how to use a specific type of sequencing and assembly which is currently in use globally to sequence SARS-CoV-2 genomes (and which can also be used for many other viral genomes): tiled amplicon sequencing.

Day 5. 2-8 pm Berlin time

Session 6: Where can I do my analysis?

Gone are the early days of sequencing, when throughput and yield were major concerns. These days, many sequencing runs will produce more data than you know what to do with. It might still be possible to carry out some stages of your analysis locally, on your own computer, especially if speed is not of high importance to your work. However, other applications might require more computer power. Here, we will discuss the resources which might be available to you, and how to use them effectively.

Session 7: Over to you

To wrap up the course, you will be given the choice of a variety of different types of pre-basecalled model dataset (metagenomic, viral, mammalian, etc.), and you will spend this final session assembling and QCing your chosen dataset. This will require you to apply all the knowledge you have gained over the last few days, in order to decide which tools to use, and how to use them. You could even practice on your own dataset, if you happen to have one available! Computational resources will be limited on the day, so your own datasets should be relatively small. You can contact us ahead of the course to discuss if your data are suitable.

Floating talks
Some of the tools we are using in the sessions will be slow, so to fill the time while we wait for them to run we will have optional additional talks on some related topics including: Genome annotation, modified basecalling and how to install tools.