Session 1-Introduction to metabarcoding procedures. The metabarcoding pipeline.
In this session students will be introduced to the key concepts of metabarcoding and the different next-generation sequencing platforms currently available for implementing this technology. The kind of results that we may obtain from metabarcoding projects is explained using examples from real life. I will outline the different steps of a typical metabarcoding pipeline which will be further reviewed along the course. I will also explain the format of the course. In this session, we will check that the computing infrastructure for the rest of the course is in place and all the needed software is installed. Core concepts introduced: next-generation sequencer, multiplexing, NGS library, metabarcoding pipeline, metabarcoding marker, clustering algorithms, molecular operational taxonomic unit (MOTU), taxonomic assignment.
Session 2-Metabarcoding markers. Primer design. PCR and library preparation protocols.
In this session students will learn about the various molecular markers that can be used for metabarcoding different kinds of samples and the quality of the information which can be retrieved from them. They will know about the most commonly used primer sets for each target taxonomic group and they will learn to use software available for designing their own custom metabarcoding primers. They will know about sample tags, library tags, adapter sequences, PCR protocols and library preparation procedures. Core concepts introduced: metabarcoding marker, universality, specificity, taxonomic range, taxonomic resolution, primer bias, amplification errors, sequencing errors, in silico PCR, sample tags, library tags, adapter sequences, PCR, library preparation kits, PCR-free methods.
Session 3-The OBITools pipeline. First steps and quality control.
In this session, we will start to work with the OBITools software suite, using a real sequence dataset as example for testing our metabarcoding pipeline. We will outline the steps needed to start analysing raw data from next-generation sequencers. The students will learn about the different data formats used by OBITools for working with sequences and they will perform protocols for quality control, paired-end alignment, sequence filtering, removal of chimeric sequences, sample demultiplexing, format conversion and dereplication of unique sequences. Core concepts introduced: fastq, fasta and extended fasta formats, Phred quality score, paired-end alignment, demultiplexing, sequence filtering, chimeras, dereplication, unique sequences, reads.
Session 4-Clustering algorithms. Fixed and variable identity thresholds.
In this session, we will introduce different algorithms available for clustering sequences into molecular operational taxonomic units (MOTUs). We will learn the differences between methods with fixed and variable identity percent threshold for delineating the MOTUS. We will run some of these algorithms with our example dataset and will analyse the differences in the results from the different methods. Core concepts introduced: MOTU, reference clustering, de novo clustering, unsupervised-learning clustering, Bayesian clustering, multi-step aggregation methods, identity threshold, variable identity threshold, singleton sequences, abundance recalculation.
Session 5-Taxonomic assignment. The ecotag algorithm. Reference databases.
In this session the students will learn about different algorithms for taxonomic assignment of MOTUs. The ecotag algorithm will be used for adding taxonomic information to the MOTUs in our example dataset and the results will be compared to those from other assignment software. Core concepts introduced: reference database, identity assignment, BLAST, phylogenetic assignment, best match, assignment of higher taxa.
Session 6-Generating, improving and curating reference databases.
The quality of the reference database used for taxonomic assignment is crucial for the accuracy and applicability of the resulting datasets from any metabarcoding project. In this session the students will learn how to build local reference databases from the information available in public sequence repositories and how to add custom sequences to existing reference databases. They will also learn how sequence reference databases interact with taxonomy databases for retrieving the phylogenetic information needed for the assignment algorithms. Core concepts introduced: ecoPCR and ecoPCR format, sequence reference database, taxonomic database, taxonomic identifier (taxid), GenBank, European Nucleotide Archive (ENA), Barcode Of Life Datasystems (BOLD), SILVA database.
In this session, students will learn about procedures for refining the final datasets obtained from the previous pipeline. They will learn about blank correction, renormalizing procedures for avoiding false positive results due to cross-sample contamination, taxonomy collapsing of related MOTUs and other algorithms for obtaining enhanced final datasets. Core concepts introduced: cross-sample contamination, renormalization, taxonomy collapsing, blank correction.
Session 8-Analysing the final dataset. α- and ß- diversity patterns.
We will discuss how to analyse and interpret the final datasets resulting from our metabarcoding pipelines, so to obtain ecologically interpretable information. Resampling and rarefying procedures for taking in consideration the different number of total reads of each sample (sampling sizes) are introduced. Measures of α-diversity and qualitative and quantitative indices for assessing the dissimilarity between samples (ß-diversity) are explained. We will also introduce the UniFrac dissimilarity distance between samples, an index taking in account not only abundances of the different MOTUs but also their taxonomic affinities. Core concepts introduced: α-diversity, ß-diversity, rarefaction, MOTU richness, UniFrac distances, multidimensional scaling (MDS).
Session 9-Presenting the final results. Online resources and future developments.
In this session we will continue with the presentation of final results. Students will learn how to plot taxonomic summaries from their datasets, including krona plots, a type of graphic representation which allow to show relative abundances of reads at different taxonomic levels. The rest of the session will be dedicated to introduce current research and possible future developments of metabarcoding / metagenomics techniques and to provide a list of useful resources for further learning, continuous training and future research opportunities. Core concepts introduced: taxonomic summary, krona plots.
Optional free afternoon to cover previous modules and discuss data.