Program

Monday – Classes from 09:30 to 17:30

 

 

 

Session 1. Introduction to environmental DNA. The ecology of eDNA.

 

 

 

First, we will introduce the course and explain its format. We will talk about the ecology of environmental DNA, discussing the different forms in which DNA can be present in the environment. We will discuss the different kinds of information that can be obtained from community-DNA samples and trace-DNA samples, and implications for quantitative value of different methods based on eDNA. We will introduce single-species methods based on quantitative PCR, and multi-species methods based on metabarcoding and metagenomics.

 

 

 

Core concepts introduced: environmental DNA, community DNA, trace DNA, metabarcoding, qPCR, stochasticity and reproducibility, quantitative value.

 

 

 

Session 2. Introduction to metabarcoding procedures. The metabarcoding pipeline.

 

 

 

In this session participants will be introduced to the key concepts of metabarcoding and the different high-throughput sequencing platforms currently available for implementing this technology. Some examples of results that can be obtained from metabarcoding projects are explained. We will outline the different steps of a typical metabarcoding pipeline. In the practical session, we will check that the computing infrastructure for the rest of the course is in place and all the needed software is installed.

 

 

 

Core concepts introduced: high-throughput sequencing, multiplexing, NGS library, metabarcoding pipeline, metabarcoding marker, clustering algorithms, molecular operational taxonomic unit (MOTU), taxonomic assignment.

 

 

 

 

 

Tuesday – Classes from 09:30 to 17:30

 

 

 

Session 3. Molecular laboratory protocols. DNA extraction. Metabarcoding markers. Primer design. PCR. qPCR. HTS library preparation. Good laboratory practice.

 

 

 

In this session we will learn about molecular laboratory procedures for eDNA and metabarcoding. Guidelines and best practices for all key laboratory steps will be discussed. We will explain sample collection techniques, including eDNA and bulk community samples, pretreatment, filtering, and DNA extraction protocols. qPCR protocols for quantifying single species will be discussed. The diverse array of molecular markers available for metabarcoding approaches on different kinds of samples and target taxonomic groups will be introduced. Guidelines to design and test metabarcoding primers will be given. Multiplexing methods and library preparation procedures will be explained.

 

 

 

Core concepts introduced: good laboratory practice, sample collection, DNA preservation, DNA extraction, clean-lab procedures, contamination avoidance, PCR, qPCR, metabarcoding marker, universality, specificity, taxonomic range, taxonomic resolution, primer bias, amplification errors, sequencing errors, in silico PCR, library preparation, sequencing platforms, sample indexing, adapter sequences.

 

Session 4. The OBITools pipeline I. Workflow, first steps and quality control.

 

 

 

In this session, we will outline the steps needed to start analysing raw data from high-throughput sequencers. The participants will learn about key bioinformatic workflows and they will perform quality control, paired-end merging, sample demultiplexing, sequence filtering, removal of chimeric sequences, format conversion, and dereplication of unique sequences. In the practical session, we will work with the OBITools metabarcoding pipeline in a Linux terminal environment, using an example dataset.

 

 

 

Core concepts introduced: fastq and fasta formats, Phred quality score, paired-end alignment, demultiplexing, sequence filtering, chimeras, dereplication, unique sequences, metabarcoding reads.

 

 

 

 

 

Wednesday – Classes from 09:30 to 17:30

 

 

 

Session 5. The OBITools pipeline II. Clustering and denoising methods. Clustering algorithms with variable thresholds.

 

 

 

In this session, we will learn about clustering and denoising methods. We will introduce denoising methods such as OBIclean, Unoise, and DADA2. We will learn to understand differences between MOTU (molecular operational taxonomic unit) and ZOTU (zero-radio OTU) / ASV (amplicon sequence variant). We will learn about constant and variable identity threshold for clustering MOTUs. We will introduce different algorithms for clustering sequences into MOTUs using variable thresholds, such as CROP and SWARM. In the practical session, we will continue with the OBITools pipeline. We will use SWARM for clustering the filtered unique sequences into MOTUs.

 

 

 

Core concepts introduced: MOTU clustering, denoising methods, denoising stringency, singleton sequences, ZOTU, ASV, abundance recalculation, reference clustering, de novo clustering, unsupervised-learning clustering, Bayesian clustering, step aggregation methods, hard identity threshold, flexible identity threshold.

 

 

 

Session 6. The OBITools pipeline III. Taxonomic assignment. Ecotag. Working with reference databases.

 

 

 

We will explain taxonomic assignment methods. We will learn about phylogenetic algorithms for taxonomic assigment. The ecotag algorithm will be used for adding taxonomic information to the MOTUs in our example dataset and the results will be compared to those from other software for taxonomic assignment. We will learn how to build local reference databases from the information available in public sequence repositories and how to add new custom sequences to these local reference databases. We will also learn how sequence databases interact with taxonomy databases for retrieving the phylogenetic information for the assignment algorithms.

 

 

 

Core concepts introduced:  phylogenetic assignment, taxonomic databases, taxonomic identifier (taxid), BLAST, GenBank, Barcode Of Life Datasystems (BOLD), local reference database, best match, assignment of higher taxa, ecoPCR and ecoPCR format.

 

 


 

 

Thursday – Classes from 09:30 to 17:30

 

 

 

Session 7. Refining the final datasets. Collapsing, renormalizing and blank correction. Rarefaction and resampling. Removal of pseudogenes. Exploratory visualization.

 

 

 

In this session, participants will learn about procedures for refining and curating the final datasets obtained from the previous pipelines. They will learn about blank correction, renormalization procedures for removing false positive results, and taxonomy collapsing of related MOTUs for obtaining enhanced final datasets. Procedures for removal of pseudogenes such as LULU algorithm will be explained. We will introduce some basic methods for exploratory visualization of multivariate biodiversity data, using R.

 

 

 

Core concepts introduced: blank correction, renormalization, taxonomy collapsing, relative abundances, resampling and rarefaction techniques, pseudogenes, LULU, taxonomic summary, barplots, krona plots.

 

 

 

Session 8. Advanced biodiversity data processing. With Rachel Meyer

 

 

 

In this session, we will present some advanced statistical methods to get ecological insights from metabarcoding data. We will show examples of multilocus metabarcoding, network analysis, random forest modelling, and diversity analysis at different scales. Moreover, we will give some advice on how to incorporate science communication and citizen science into biodiversity assessment projects, by presenting initiatives such as CALeDNA (California Environmental DNA Program).

 

 

 

Core concepts introduced: multilocus metabarcoding, network analysis, random forest modelling, citizen science.

 

 

 

 

 

Friday – Classes from 09:30 to 17:30

 

 

 

Session 9. Data visualization. α- and ß- diversity. Inference of ecological patterns.  Correlation with environmental variables.

 

 

 

We will use different qualitative and quantitative biodiversity indices for assessing dissimilarity between samples. We will explain multivariate ordination methods such as PCA, RDA and nMDS. We will introduce PERMANOVA methods for detecting significant differences between ecological conditions. In the practical session we will continue working with the example dataset and we will perform some visualizations and analyses in R.

 

 

 

Core concepts introduced: α-diversity, ß-diversity, MOTU richness, Jaccard distances, Bray-Curtis distances, UniFrac distances, ordination techniques, principal component analysis (PCA), redundancy analysis (RDA), multidimensional scaling (MDS), PERMANOVA, BetaDisper, EnvFit.

 

 

 

Session 10. How to design a successful metabarcoding project. Questions and answers.

 

 

 

In the final session we will learn how to design a successful metabarcoding project and how to customize it in function of the specific needs. We will discuss the best strategies for obtaining good results by optimizing time, money and computing resources. The idea is to make this session as interactive and useful as possible. We will present some current and future projects in the format of an open discussion and we will try to propose the best solutions for every potential problem in a collaborative way. The remaining time can be dedicated to introduce current research and possible future developments of metabarcoding / metagenomics techniques and to provide a list of useful resources for further learning, continuous training and future research opportunities. We will finish the workshop with an interactive open questions session.

 

 

 

Core concepts discussed: experiment planning, optimal multiplexing, ecological replication, technical replication, sequencing depth, price per sample.