Session content

Monday– Classes from 2-8 PM Berlin time

Preprocessing DNA sequencing and somatic variant calling

The fastq format and alignment of DNA sequencing reads
Fastq files, history, usage, and format
The task of read alignment
BWA: the standard algorithm / tool
Visualisation of read alignments

Examining alignment metrics
Why quality control is important
What metrics can be examined
Picard: a useful package for generating alignment metrics

Calling SNVs and indels
Variant calling in germline and cancer data
The issue of sequencing noise
VarScan: a simple variant caller
Other advanced tools: Mutect 2

Tuesday– Classes from 2-8 PM Berlin time

Variant annotation, Copy number calling

Annotation of variants
The purpose of annotating SNVs and indels
Database examination and public data
SIFT: annotation tool

Basic graphical plots and statistics on mutation calls
A review of common plot types
Plotting mutant allele frequencies in R

Calling copy number aberrations (CNAs)
The importance of aneuploidy in cancer
The link of variant allele frequencies and CNAs
Sequenza: a fast and intuitive tool for somatic CNA calling

Wednesday– Classes from 2-8 PM Berlin time

Quality control for somatic mutations and copy number data

Theoretical background
What somatic mutations tell us about copy numbers;
The mathematical relation between somatic allele frequencies, tumour ploidy and sample purity;
The mathematical relation to normalise allele-specific copy numbers and determine Cancer Cell Fractions

Practical lecture (in R, with markdowns associated)
Data loading and pre-processing
Quality control metrics of copy number calls and tumour purity with CNAqc
Cancer Cell Fractions estimation and quality control with CNAqc
Further data visualisation and other statistics

Thursday– Classes from 2-8 PM Berlin time

Subclonal deconvolution from somatic variants

Theoretical background
The principles of mutation clustering for subclonal deconvolution
Population genetics approaches for the mutation site frequency spectrum
Clustering for subclonal deconvolution, selecting the best model and interpreting the data

Practical lecture (in R, with markdowns associated)
Data loading and pre-processing
A full pipeline for clustering the site-frequency spectrum in MOBSTER
A downstream pipeline for clustering read counts with VIBER
Principles of extension to multiple samples from the same patient

Friday– Classes from 2-8 PM Berlin time
Course project

Participants will be split in small groups depending on total attendants, and will be given high-resolution cancer sequencing data to work with. The aim will be to perform tasks among those presented in the previous lectures, and assemble results to present in a short format to the other groups of participants.