Reproducibility in Bioinformatics

Dates

15-17 April 2024

 

To foster international participation, this course will be held online

 

 

Course overview

This course aims at increasing awareness and introduces strategies on how to improve reproducibility in bioinformatic analyses. Through a mixture of theoretical blocks and hands-on exercises the instructors will guide participants to develop skills to increase reproducibility of bioinformatic analyses and workflows using containers, versioning and virtual environments.

Target audience and assumed background

The target audience for this course are graduate students and researchers who work with large datasets. Basic working knowledge of the Linux command line (eg. navigation in the file system, creating files and folders, executing commands) is required and experience with working on remote systems (via ssh) is an advantage. Basic knowledge of a scripting language is also beneficial (eg. python or Perl).

Learning outcomes

●    Basic concepts and techniques for modern reproducible bioinformatics data analyses
●    Data organization, documentation and software versioning
●    Setting up and working in virtual software environments
●    Software containerization strategies and caveats - how to use and build containers
●    Knowledge of how to use common workflow management systems

Program

 

Monday– Classes from 2-8 PM Berlin time - Introduction and basic concepts

 

  • 1.    Theory: Creating awareness for reproducibility issues in bioinformatic analyses

  •                   a.    Examples from the Literature
                      b.    Initiatives to increase reproducibility

  • 2.    Exercise: Linux command line, refreshing skills and advanced concepts
                      a.    pipes for logging
                      b.    shell scripts

  • 3.    Theory: Version Control tools, Data organization and proper documentation
                     a.    linux and git (collaborative work in software development)
                     b.    A lab book for interactive Bioinformatics (eg. Jupyter lab)
                     c.    Organizing data properly - a suggestion

  • 4.    Exercise: Git, Markdown, YAML
                    a.    creating and working with git and Markdown documents

 

 

Tuesday Classes from 2-8 PM Berlin time - Software encapsulation

 

  • 1.    Theory: Different ways to encapsulate software and their limits
                   a.    Modifying environments in Linux (conda and module)
                   b.    Introducing containers
                   c.    Environment modification and Containers in HPC systems

  • 2.    Exercise: Docker/Singularity basics
                  a.    Running containers (eg. Jupyter lab)
                  b.    Persistence of containers
                  c.    Transferring files to containers

  • 3.    Theory: Creating own containers
                 a.    Adapting Linux command line skills to create own containers
                 b.    Communications between containers

  • 4.    Exercise: Docker/Singularity advanced topics
                 a.    Combining containers with conda

 

 

Wednesday– Classes from 2-8 PM Berlin time -Reproducible workflows

 

  • 1.    Theory: From scripts to pipelines with WMS
                a.    Reproducible workflows long existed in software development (early examples)
                b.    Variants of workflow management systems for bioinformatics (Snakemake, Nextflow, etc.)

  •             c.    Connecting scripts to create analyses pipelines

  • 2.    Exercise: Snakemake, Nextflow, GNU Make
               a.    Creating a workflow in Snakemake and Nextflow

  • 3.    Theory: Workflows on different computers
               a.    Differences between solitary and HPC linux systems
               b.    Enable reproducibility between different computer environments

  • 4.    Exercise: Combining everything
              a.    Creating a self sustained pipeline transferable to other systems

 


Cost overview

Package 1

380 €


Should you have any further questions, please send an email to info@physalia-courses.org

Cancellation Policy:

 

> 30  days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

 

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.