Session content

Monday – Classes from 09:00 to 18:00 -get it starting”


Session 1: Introduction (morning)

In this session I will kick off with an introduction lecture about genome assembly and annotation - the past, the present and the future. I will use this introduction to motivate the five-day course. Next, I will explain the use of the virtual machine (VM), and the use of cloud computing. This is followed by short introduction to Linux (although I would prefer if student know a bit of Linux). Through the morning we will kick off our first assembly and put it through an annotation tool (Companion).



Session 2: Visualization (half afternoon)

 During this afternoon, we are going to visualize the assembled and annotation genome from this morning in Artemis. The aim is to use the viewer to inspect the annotation, correct annotation and write out files. Next, we are going to perform a comparative exercise, (comparing the genome from the morning with a close reference) to understand the concept of syntheny, breakpoint or errors.



Session 3: Mapping

 In this module, I will teach the basics of read mapping. We will map reads with bwa mem onto a reference and will examine duplications and errors through not proper mapped read pairs. This is important to exanimate the correctness of assemblies and will be used later the week.




Tuesday – Classes from 09:00 to 18:00 “learn it the old way”



Session 4: De Brujin graph and PAGIT

This module is dedicated to short read assembly. Although it might be superseded due to long reads, understanding the concept of short reads and De Brujin graph is crucial. After a seminar about this subject, we will assemble the same genome as before, but this time with Illumina: de novo assembly with velvet, contig ordering, error correction. Through comparative genomics we are going to look at errors in the assembly, and how they could be found with remapping short reads, and also split long reads. Last, we are going to compare the assembly to the assembly from Monday. This session will go into the afternoon of Tuesday.



Session 5: RNA-Seq

In this session, we will analysis the transcriptome of the sample we assembled so far, motivated through a little talk. In the exercise, we will map RNA-Seq reads, (short and long reads) understanding first the basics of RNA-Seq, but then will use the reads to correct gene models. We will discuss the concept of alternative splicing.

 Finally, we will annotate our assembly with Augustus, using the mapped RNA-Seq data and some manually corrected genes.




Wednesday – Classes from 09:00 to 18:00 - do it yourself”



Session 6: Large genome assembly

First we are going to kick off an assembly of a larger genome, and let it run in the cloud over the day and the night. It will be important during the day to check if the assembly is still running.



Session 7: Group Taks I

Group task I: You will get a set of reads (from a random technology) and need to generate a draft genome assembly. Due to time restriction, the reads will be from a bacterial genome and you need to

  •         Assemble the genome
  •         Check the assembly
  •        Annotate the genome

This task will be done in groups of 2-3 students. During the day, I will motivate some of the new tools, like bacterial annotation, circularization etc. through little talks.




Thursday  – Classes from 09:00 to 18:00   - “apply your knowledge to real world example”



Session 8: Group Taks I continued

First each group will present the results of the group task, quality of the assembly, amount of annotated genes, and give an outlook what analysis they would do next and why.



Session 9: Large genome assembly continued

For the rest of the morning, we are going to analyse the larger genome assembly started Wednesday morning. How did it come along? How big is it? What is it? How much compute did it take?



Session 10: Group Task II

Group task 2, pick 2 projects: During the next 24h, each group will work on two different projects. Those range from: Assembly of genomes, size from 1-40MB, virus assembly, using HiC data for the assembly, comparing different sequencing technologies, comparing different assembly tools, evaluate the quality of assemblies, annotate a large genome etc. Each group will pick 2 projects and needs to manage the amount of time (what and when to set off running, what will you run over night?). Important, students could also analyse their own data as a project, if they can convince the other group members that their project is interesting.

 These project simulates more “real life” examples as compared to the data from the exercises the rather raw reads are in the short read archives, some might be of bad quality, other are contaminated, etc.




Friday  – Classes from 09:00 to 18:00 - "Already an expert?"



Session 11: Group Task II continued

We are continuing with the group task.



Session 12: Group Task II presentation

In this session, each group will present the results of their work. Each member of the groups will have to present parts of the projects. This is followed by a general discussion about the group tasks, and the course.