Population genomic inference from low-coverage whole-genome sequencing data


11-14 October 2021

Due to the COVID-19 outbreak, this course will be held online



Low-coverage whole-genome sequencing of individually barcoded samples is emerging as an increasingly popular alternative to pooled sample sequencing and to reduced representation methods such as RAD-seq or target capture approaches. Low-coverage sequencing provides a cost-effective way to survey variation across the entire genome at a population scale, but is hampered by the large data uncertainty which prevents the use of standard analysis programs based on genotype calling.  In this course, we will explore workflows and the underlying rationale behind producing, processing, and analyzing low-coverage sequencing data for population genomic inference. Given that most species have insufficient reference data to allow reliable genotype imputation, we will focus on genotype likelihood-based methodology that can be applied to any system. We will primarily cover methods and algorithms implemented in the ANGSD software package and associated programs, providing best-practice guidelines and discussion of how participants can make maximal use of low-coverage whole genome re-sequencing data for their studies. 


Target audience


The course is aimed at researchers who might have previous experience with next generation sequencing (NGS) data (e.g. exome/RAD/pooled sequencing) and wish to explore the potential for using low-coverage whole-genome sequencing for their studies. Researchers who want an introduction to the ANGSD software package and related software based on genotype likelihoods, and an understanding of their inherent probabilistic framework, will benefit from this course.




We will assume that participants have a basic background in population genomics and basic familiarity with NGS data. Previous experience with UNIX-based command line and R is also an advantage. We will not have time to comprehensively introduce these computing environments during the course, so we ask participants without previous experience in Unix and R to go through suggested tutorials on their own prior to the course. All hands-on exercises will be run in a Linux environment on remote servers. Statistical analyses and data visualization will be run in R. 




After attending the course, the participants will appreciate the use of whole-genome sequencing for population genomics. The participants will be able to demonstrate the challenges associated with low-coverage sequencing data and will have an intuition about the statistical framework implemented in ANGSD/ngsTools/Atlas. They will be familiar with building a bioinformatic pipeline to process low-coverage sequencing data to perform different types of population genomic analyses, such as inference of demographic histories and detection of signatures of natural selection.

Teaching format


The course will comprise a mix of interactive lectures with small exercises followed by a longer independent practical each day. Data will be provided for exercises.



Monday - Classes from 2 to 8 pm Berlin time



We will discuss the rationale behind the use of low-coverage whole-genome sequencing for population genomic inference. We will then walk through typical workflows for going from sample to raw sequencing data, perform a de novo genome assembly, and finally obtain mapped sequencing reads.


COst overview

Cancellation Policy:




> 30  days before the start date = 30% cancellation fee


< 30 days before the start date= No Refund.




Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.