Monday– Classes from 2-8 PM Berlin time
Preprocessing DNA sequencing and somatic variant calling
The fastq format and alignment of DNA sequencing reads
Fastq files, history, usage, and format
The task of read alignment
BWA: the standard algorithm / tool
Visualisation of read alignments
Examining alignment metrics
Why quality control is important
What metrics can be examined
Picard: a useful package for generating alignment metrics
Calling SNVs and indels
Variant calling in germline and cancer data
The issue of sequencing noise
VarScan: a simple variant caller
Other advanced tools: Mutect 2
Tuesday– Classes from 2-8 PM Berlin time
Variant annotation, Copy number calling
Annotation of variants
The purpose of annotating SNVs and indels
Database examination and public data
SIFT: annotation tool
Basic graphical plots and statistics on mutation calls
A review of common plot types
Plotting mutant allele frequencies in R
Calling copy number aberrations (CNAs)
The importance of aneuploidy in cancer
The link of variant allele frequencies and CNAs
Sequenza: a fast and intuitive tool for somatic CNA calling
Wednesday– Classes from 2-8 PM Berlin time
Quality control for somatic mutations and copy number data
Theoretical background
What somatic mutations tell us about copy numbers;
The mathematical relation between somatic allele frequencies, tumour ploidy and sample purity;
The mathematical relation to normalise allele-specific copy numbers and determine Cancer Cell Fractions
Practical lecture (in R, with markdowns associated)
Data loading and pre-processing
Quality control metrics of copy number calls and tumour purity with CNAqc
Cancer Cell Fractions estimation and quality control with CNAqc
Further data visualisation and other statistics
Thursday– Classes from 2-8 PM Berlin time
Subclonal deconvolution from somatic variants
Theoretical background
The principles of mutation clustering for subclonal deconvolution
Population genetics approaches for the mutation site frequency spectrum
Clustering for subclonal deconvolution, selecting the best model and interpreting the data
Practical lecture (in R, with markdowns associated)
Data loading and pre-processing
A full pipeline for clustering the site-frequency spectrum in MOBSTER
A downstream pipeline for clustering read counts with VIBER
Principles of extension to multiple samples from the same patient
Friday– Classes from 2-8 PM Berlin time
Course project
Participants will be split in small groups depending on total attendants, and will be given high-resolution cancer sequencing data to work with. The aim will be to perform tasks among those
presented in the previous lectures, and assemble results to present in a short format to the other groups of participants.