CUrriculum

Classes from 09:30 to 17:30

Session 1- The 16S rRNA gene

The use of the 16S rRNA gene as a marker for prokaryote phylogenetics will be discussed to introduce the students to the concept of conserved and hypervariable regions. The student will learn about the history of this molecular marker and why it is the choice for prokaryote diversity studies. The primer combinations used to target the different hypervariable regions will be discussed, as well as what is known regarding their advantages and disadvantages. The pros and cons of PCR-based 16S rRNA gene sequencing versus PCR-free shot-gun metagenomics will also be discussed.
This session will also include an overview of current sequencing technologies, and the Illumina MiSeq platform will be contrasted with other sequencing technologies (Ion Torrent, MinIon, PacBio and Moleculo).

Session 2- Sequencing experimental design and initial hands-on exercises

Focusing on the MiSeq platform, experimental design considerations will be discussed and topics discussed will include sequence depth, replication, contamination and the use of appropriate controls and mock communities. Other topics that will be taught include: metadata collection, DNA extraction and RNA-cDNA sequencing. Demo sequence data will be used to check that the appropriate tools are installed correctly for subsequent practical work, and students will perform exercises in the examination of sequence files to obtain basic characteristics of sequence datasets such as the number of sequences, sequence length ranges and sequence quality. Students will learn how to look into Illumina fastq files using FASTQC to check for sequence quality, and in Linux students will look inside the fastq files to understand the information they contain and to differentiate these from fasta files.

Classes from 09:30 to 17:30

Session 3-Library preparation for MiSeq sequencing

The choice of sequencing libraries can have a substantial effect on the quality and quantity of data obtained from the MiSeq, and will also change the cost and ease of wet-lab procedures. An overview of sequence library preparation methods will be taught, contrasting two-step (Illumina Nextera) to one step library kits. Students will learn how samples are barcoded, how the PCR fragments are prepared for sequencing, and what are the implications for sequence fragment sizes. Students will also learn how sequencing libraries are pooled and loaded on the MiSeq, what are the consequences of loading too much or too little library DNA onto the MiSeq flow cell to introduce the concepts of over- and under-clustering. The choice of sequencing kits (i.e., V2 or V3 250-600 cycles) will also be discussed.

Session 4-Practical session on sequence analysis pipelines

The main sequence analysis tools will be introduced: mothur, QIIME and DADA2. Background information on the rationale behind the different sequence analysis steps, such as trimming, merging and removal of low quality sequences, chimera checking and sequence annotation methods will be detailed. A particular focus will be given to the different strategies for generating OTUs. Students will follow exercises in the initial sequence quality control and pre-treatment options using FASTX-toolkit, FLASH and BBMerge.

Classes from 09:30 to 17:30

Session 5-mothur tutorial and QIIME tutorial

In this session the mothur MiSeq standard protocol will be followed using model data. This tutorial will take the students through steps involving further sequence quality control, sequence noise reduction, sequence alignment, chimera checking and removal, removal of contaminants, and clustering to generate OTU tables, phylogenies and OTU classifications. The choices of curated 16S rRNA gene databases (Silva, Greengenes and RDP databases) will be explained, and finally the .biom files will be generated and its uses discussed. In this session we will also run the model data through the QIIME pipeline. QIIME is a very popular 16S rRNA gene sequence analysis tool and in this session it will be used to generate the OTU table, phylogenetic trees, sequence classifications and the biom table as demonstrated previously with mothur.

Session 6-DADA2

DADA2 is a relatively new R package that combines all steps in amplicon sequence analysis from sequence quality filtering, merging paired-end sequences and sample inference. DADA2 is highly accurate, designed to resolve fine-scale sequence variation, and unlike mothur or QIIME, DADA2 does not cluster sequences into OTUs, potentially allowing for the detection of strain-level diversity (i.e. ribosomal sequence variants, or RSVs). This session will run through the DADA2 paired-end pipeline and its outputs will be compared to that of mothur and QIIME.

Classes from 09:30 to 17:30

Session 7-Using statistical tools provided in mothur and QIIME

In this session the importance, relevance and drawbacks of data normalisation, subsampling and rarefaction prior to statistical analyses will be considered. Subsequently this session will involve using statistical tools available in mothur and QIIME that allow the determination of diversity coverage, alpha and beta diversity estimations and community similarity estimation across samples. Amova, homova and metastats will be demonstrated in mothur and in QIIME the UniFrac measure of dissimilarity will be used to analyse patterns of community data across treatments.

Session 8-using DADA2-Phyloseq Bioconductor microbiome workflow

Phyloseq is an R package to import, store and analyse microbiome data. Phyloseq allows the integration of OTU/RSV tables, taxonomy tables, phylogenetic trees and sample metadata into a single experiment level object. Phyloseq is well integrated with a variety of ecological statistical tools available in R, such as vegan, ape and ggplott2 for the analysis and visualisation of data. In this session we will start exploring a Bioconductor workflow using Phyloseq data object which allows the generation of publication-ready plots. We will learn how to import and store data in Phyloseq, how to subset data to study specific taxonomic groups or treatments. We will also explore how to filter low abundance taxa, how to agglomerate OTUs/RSVs abundance by taxonomic rank or by phylogenetic distance, and how to transform data and work with rank-transformed sequence abundance data.

Classes from 09:30 to 17:30

Session 9-multivariate statistics and correlation network analysis in R and Linux

In this session we will learn multivariate statistical tools available from Phyloseq/Bioconductor to analyse the OTU/RSV tables generated earlier in the workshop. We will learn how to conduct analysis of differential abundance across treatments using Deseq2, and Phyloseq and ggplot2 will be used to generate MDS, PCoA and other ordination plots. The choice of different distance and similarity indices (bray-curtis, unifrac, jaccard, gower) will be discussed. Subsequently PICRUST will be used to predict possible functional roles (such as pathogenicity, environmental nutrient cycling, decomposition etc) of the sequenced microbial community, and the drawbacks of predicting function from taxonomy will also be discussed. Lastly, SCINC and MIC will be used to perform OTU correlation network analysis, allowing inference of interactions between microbial groups present in the dataset. Correlation network plots will be visualised in Cytoscape.

Optional Session 10-Wrap up and questions

In this session, the students will be able to continue further statistical analyses started in session 9, and any questions from previous sessions will be addressed. Students may also start analysing their own data using the tools taught in the workshop with advice from the instructor.