Assembly and Annotation of genomes


14-18 March 2022


Due to the COVID-19 outbreak, this course will be held online



This course will introduce biologists and bioinformaticians to the concepts of de novo assembly and annotation, providing a theoretical framework and practical examples. A variety of sequencing technologies and their applications to generate high-quality reference genomes will be presented and discussed. They include Illumina short reads (for both assembly and gene annotation), PacBio HiFi (‘High Fidelity’) and CLR (‘Continous Long Read’) reads, Oxford Nanopore long and ultralong reads, as well as scaffolding technologies including optical mapping and proximity ligation (Hi-C).   Special attention will be given to quality control throughout the assembly process (e.g. tools such as Genomescope, Merqury, Pretext) as well as to consensus and structural error mitigation. Annotation tools using Illumina RNA-Seq and Pacbio IsoSeq data will be introduced. By the end of the course the students will be able to understand what is needed to generate an annotated reference genome of high-quality.


Targeted Audience & Assumed Background

The course is aimed at researchers interested in learning more about genome assembly and annotation. It will include information useful to both beginners and more advanced users. We will start by introducing general concepts in a historical background. We will then describe all major components of a typical genome assembly workflow using the Vertebrate Genomes Project assembly pipeline as example. We will further analyse the multiple ways a genome can be annotated to maximize its utility for downstream analyses. There will be a mix of lectures and hands-on practical exercises, either using graphic interfaces ( and basic command line. Prior experience with Linux is welcome but not required. No prior background in DNA sequencing is required.

Learning outcomes

-       Understand the concepts related to de novo genome assembly and annotation for genomes of all sizes, from virus to mammals

-       Learn the strengths and weaknesses of different sequencing technologies, including Illumina short read sequencing, Pacific Biosciences and Oxford Nanopore long read sequencing, as well as scaffolding technologies including optical mapping and proximity ligation (Hi-C), for de novo genome assembly and annotation.

-       Gain hands on experience with common tools for de novo genome assembly, assembly quality evaluation, and assembly visualization

-       Hands on experience of feature annotation (e.g. genes, repeats, transposable elements)



Monday– Classes from 2-8 pm Berlin time 


Session 1: Introduction
In this kick-off session students will be introduced to genomics. This introduction will include an historical background on DNA sequencing and genome assembly techniques that were developed since the discovery of nucleic acids. It will range from early strategies to popular high throughput approaches, including a general overview of so called ‘third-generation’ approaches (long-read sequencing) and scaffolding approaches (optical mapping, proximity ligation Hi-C). This introduction on DNA sequencing will be accompanied by an overview of the evolution of bioinformatics throughout the 20th Century and the early 21st Century. Special attention will be paid to the evolution of genome assembly approaches, including those that will be employed in the rest of the course.
Session 2: Genome assembly today
In this section we will discuss some of the latest developments in the f