Session content

Sunday 2nd - 6:00 p.m. to 9:00 p.m.
Bonus: Software installation session & get together
Detailed tutorials on software installation will be provided prior to the course. However, sometimes difficulties can be encountered, thus in this session you can get help and make sure everything is ready to go for the course. If you already everything installed, this is a good opportunity to help others with their installation or socialize.


Monday 3rd – Classes from 09:30 to 17:30
Introduction to metabarcoding procedures. The metabarcoding pipeline.
In this session students will be introduced to the key concepts of metabarcoding and the different next-generation sequencing platforms currently available for implementing this technology. Some examples of results that can be obtained from metabarcoding projects are explained. We will outline the different steps of a typical metabarcoding pipeline and introduce some key concepts. We will also explain the format of the course. We also talk about technical replication and other experimental design considerations. In this session, we will check that the computing infrastructure for the rest of the course is in place and all the needed software is installed for those that couldn’t attend the bonus session on Sunday.
Core concepts introduced: high-throughput sequencing, multiplexing, NGS library, metabarcoding pipeline, metabarcoding marker, clustering algorithms, molecular operational taxonomic unit (MOTU), taxonomic assignment, technical replication, sequencing depth, price per sample.
Molecular laboratory protocols. DNA extraction. Metabarcoding markers. Primer design. PCR and library preparation. Good laboratory practice.
In this session we will learn the basics about molecular laboratory procedures needed for metabarcoding. While there will be no hands-on laboratory practices, guidelines and best practices for all key laboratory steps will be discussed. We will explain sample collection techniques, including eDNA and bulk community samples, pretreatment and DNA extraction protocols. The diverse molecular markers available for different kinds of samples and target taxonomic groups will be discussed. The students will learn to design and test custom metabarcoding primers. They will know about sample tags, library tags, adapter sequences, PCR protocols and library preparation procedures.
Core concepts introduced: good laboratory practice, proper sample collection, bulk (community DNA) and eDNA samples, DNA preservation, DNA extraction, PCR, clean up, metabarcoding marker, universality, specificity, taxonomic range, taxonomic resolution, primer bias, amplification errors, sequencing errors, DNA contaminations, in silico PCR, library generation, sequencing platforms, sample indexing, adapter sequences.

Day 2: Tuesday 4th – Classes from 09:30 to 17:30

Development of universal metabarcoding primers
In this practical session, we will look at development of primers for DNA metabarcoding. We will be using sequences from NCBI and BOLD to generate sequence alignments with PrimerMiner to verify suitability of existing primer sets as well as your own primer sets, specific to ecosystem targeted taxonomic groups.
Core concepts introduced: reference sequence databases, PCR basics, annealing temperature, primer design guidelines, degeneracy, inosine, in silico evaluation, mock community evaluation, primer bias, blocking primers, in-line tagging, illumina adapters


Day 3: Wednesday 5th – Classes from 09:30 to 17:30

The USEARCH pipeline

In this session, we will work with the USEARCH and VSEARCH software suites, using a real sequence dataset as example for testing our metabarcoding pipeline. We will outline the steps needed to start analysing raw data from high-throughput sequencers. The students will learn about key bioinformatics workflows and they will perform quality control, sample demultiplexing, paired-end merging, sequence filtering, removal of chimeric sequences, format conversion, dereplication of unique sequences, sequence clustering as well as taxonomy assignment using reference databases. We will run most commands in an R environment using a user friendly modular wrapper script, with specific focus on when and why each module is necessary. We also introduce mBRAVE as an easy to use cloud based platform as well as cutting edge methods to obtain Exact sequence variants / haplotypes from metabarcoding data.
Core concepts introduced: fastq and fasta formats, Phred quality score, paired-end alignment, demultiplexing, sequence filtering, chimeras, dereplication, unique sequences, reads, singleton sequences, abundance recalculation, OTU clustering, sequence repositories, identity assignment, BLAST,  GenBank, Barcode Of Life Datasystems (BOLD), mBRAVE, read denoising, Exact Sequence Variants (ESVs), haplotyping.
Day 4: Thursday 6st – Classes from 09:30 to 17:30

The OBITools pipeline I. Workflow, first steps and quality control. Clustering algorithms with variable thresholds.
In this session, we will work with the OBITools software suite, using the same dataset we used in USEARCH for testing some alternative metabarcoding pipelines from a Linux terminal environment.
We will also introduce the use of other programs to complement the OBITools pipeline to optimize the different steps. Among them, algorithms for clustering sequences into MOTUs with flexible versus fixed similarity threshold (such as CROP and SWARM), or algorithms for post-clustering collapse of erroneous MOTUs.
Core concepts introduced: reference clustering, de novo clustering, Bayesian clustering, step aggregation methods, hard identity threshold, flexible identity threshold, co-occurrence.
The OBITools pipeline II. Taxonomic assignment using ecotag.
In this session we will continue with the OBITools pipeline. We will learn about phylogenetic algorithms for taxonomic assingment. The ecotag algorithm will be used for adding taxonomic information to the MOTUs in our example dataset and the results will be compared to those from other assignment software. The students will learn how to build local reference databases from the information available in public sequence repositories and how to add new custom sequences to these local reference databases. They will also learn how sequence databases interact with taxonomy databases for retrieving the phylogenetic information for the assignment algorithms.
Core concepts introduced: local reference database, phylogenetic assignment, best match, assignment of higher taxa, ecoPCR and ecoPCR format, taxonomic database, taxonomic identifier (taxid).
Friday 7nd – Classes from 09:30 to 17:30

Comparing the results from different pipelines. Refining the final datasets. Collapsing, renormalising and blank correction. Visualization of results.
In this session, students will learn about procedures for refining and curating the final datasets obtained from the previous pipelines. They will learn about blank correction, renormalization procedures for deleting false positive results, and taxonomy collapsing of related MOTUs for obtaining enhanced final datasets. We will compare the results from the different pipelines tested and we will discuss how to interpret them in order to obtain ecologically relevant information.
Core concepts introduced: renormalization, taxonomy collapsing, blank correction.
Presenting the final results.  α- and β- diversity patterns.
In this session we will continue with the presentation of final results. Students will learn how to plot taxonomic summaries from their datasets, including circular plots, a graphic representation showing relative abundances of reads at different taxonomic levels and taxa overlap among samples, as well as Venn diagrams. Resampling and rarefaction procedures will be introduced for the calculation of qualitative and quantitative indices for assessing dissimilarity between samples. We will introduce other statistic analysis of the samples such as analysis of the variance and generalized linear models.
Core concepts introduced: taxonomic summary, sunburst plots, α-diversity, β-diversity, rarefaction, MOTU richness, non-metric multidimensional scaling (NMDS), clustering, PERMANOVA.