Assembly and Annotation of genomes

Dates

18-22 March 2024

 

To foster international participation, this course will be held online

 

overview

This course will introduce biologists and bioinformaticians to the concepts of de novo genome assembly and annotation, providing a theoretical framework and practical examples. A variety of sequencing technologies and their applications to generate haplotype-phased, high-quality reference genomes will be presented and discussed. They include Illumina short reads (for both assembly and gene annotation), PacBio HiFi (‘High Fidelity’) and CLR (‘Continuous Long Read’) reads, Oxford Nanopore long and ultralong reads, as well as scaffolding technologies including optical mapping and proximity ligation (Hi-C). Special attention will be given to quality control throughout the assembly process (e.g. tools such as Genomescope, Merqury, Pretext) as well as to consensus, structural error mitigation and manual curation. The concept of Telomere-to-telomere (T2T) genome assembly, and the means to achieve it, will also be introduced. Annotation tools using Illumina RNA-Seq and Pacbio IsoSeq data will be introduced. By the end of the course the students will be able to understand what is needed to generate an annotated  and curated reference genome of high-quality.

 

Targeted Audience & Assumed Background

The course is aimed at researchers interested in learning more about genome assembly and annotation. It will include information useful to both beginners and more advanced users. We will start by introducing general assembly and annotation concepts and algorithms, providing a historical context. We will then describe all major components of a typical genome assembly workflow using the Vertebrate Genomes Project assembly pipeline as example. We will further analyse the multiple ways a genome can be annotated to maximize its utility for downstream analyses. There will be a mix of lectures and hands-on practical exercises, either using graphical interfaces (https://assembly.usegalaxy.eu/) and basic command line. Prior experience with Linux is welcome but not required. No prior background in DNA sequencing is required.

Learning outcomes

-       Understand the concepts related to de novo genome assembly and annotation for genomes of all sizes, from viruses to mammals

-       Learn the strengths and weaknesses of different sequencing technologies, including Illumina short read sequencing, Pacific Biosciences and Oxford Nanopore long read sequencing, as well as scaffolding technologies including optical mapping and proximity ligation (Hi-C), for de novo genome assembly and annotation.

-       Gain hands on experience with common tools for de novo genome assembly, assembly quality evaluation, assembly visualization and manual curation

-       Hands on experience of feature annotation (e.g. genes, repeats)

program

 

Monday– Classes from 2-8 pm Berlin time 

 

Session 1: Introduction
In this kick-off session students will be introduced t