CUrriculum

Monday – Classes from 09:30 to 17:30
Session1: Introduction


This course starts with a general introduction to sequencing and assembly. The audience will get familiar with Oxford Nanopore sequencing, how it works, its advantages and disadvantages. Afterwards, we will transform a subset of a bacterial dataset, containing electric current signals, into a set of nucleotide sequences with error rate higher than previous generations of sequencing.

 


Session2: Stitching fragments


Sequencing technologies are still unable to read the whole genome at once, therefore the obtained fragments need to be joined together. We will first try and use sequence alignment, the basis of many bioinformatics tools. As it is not feasible for larger amounts of data, we will investigate a heuristic approach that uses short substrings of predefined length (Minimap). We will discuss the trade-off between execution time and sensitivity, and its impact on assembly contiguity, and apply this method on a small bacterial dataset.

 


Tuesday – Classes from 09:30 to 17:30
Session3: Unknotting graphs


Given the set of pairwise overlaps between fragments, we will build an assembly graph from which the genome can be reconstructed (Miniasm). The graph will look like a yarn ball due to the sheer amount of overlaps. Step-by-step, we will introduce and apply several simplification methods to untangle the graph. There will still be knots in the graph which occurred due to sequencing errors. We will examine and try to resolve them. Afterwards, contiguous chains of fragments will be extracted and used in the next phases.


Session4: Polishing until it shines


Contigs from the assembly graph will have accuracy as the sequencing yield and will be unusable for most downstream analyses. Therefore, we will map all fragments to the assembly and create a multiple sequence alignment with partial order graphs (Racon). Retaining the most frequent base in all fragments at a given assembly position, we will iteratively try to increase the overall accuracy. Once we reach the maximum, we will see if we can further improve the assembly by using signal level data (Nanopolish).

 


Wednesday – Classes from 09:30 to 17:30
Session5: Quality assessment


Quality of the assembly is important for downstream analysis so we will assess it in three different aspects: base accuracy (MuMmer) and completeness (QuastLG) given the reference genome, and protein prediction (orthologs (BUSCO) and ORFs (Ideel)). We will cover each appropriate tool and apply them on our assembly.


Session6: State-of-the-art


We will go through the basic concepts of several state-of-the-art assemblers such as Canu, Redbean, Flye, etc. We will apply each of them on the same dataset and create an evaluation consisting of contiguity, accuracy and the amount of resources needed.


Thursday – Classes from 09:30 to 17:30
Session7: State-of-the-art continued


Session8: Group task


Attendees will get several sets of fragments obtained with Oxford Nanopore sequencing, ranging from a couple of megabytes to a hundred. The task will be to assemble as many of the datasets as possible with different assemblers, and evaluate the quality of each assembly. Participants will be grouped into pairs or triplets. We also encourage them to bring their own data if they deem it interesting to assemble.


Friday – Classes from 09:30 to 17:30
Session9: Group task continued


Session10: Presentations
Each group will present the result of their work which will be followed by a general discussion about the group task and the course itself.