Reproducibility data analysis with R

Reproducibility data analysis with R

Dates

20-23 July 2026

To foster international participation, this course will be held online

Course overview

Reproducibility is a fundamental principle of good science, yet many computational analyses are difficult to reproduce. Scripts depend on specific package versions, data files are missing, workflows are poorly documented, and even the original author may struggle to rerun the analysis months later.

This course provides practical solutions to these challenges by teaching participants how to build fully reproducible data analysis workflows in R. Through a combination of lectures, live coding, and hands-on exercises, participants will learn how to organise projects, document analyses, manage dependencies, and share code and results in a transparent and reliable way.

The course introduces modern tools widely used in the R ecosystem, including Quarto for dynamic and reproducible documents, Git and GitHub for version control and collaboration, and renv for managing software dependencies. Participants will also learn how to structure their analyses as research compendia, a widely recommended framework for organising code, data, and documentation in a reproducible format.

Finally, the course will cover how to encapsulate computational environments using Docker containers, ensuring that analyses can be reproduced across different machines and long after the original project was completed.

By the end of the course, participants will have the skills and practical workflow needed to create transparent, shareable, and long-lasting research projects, improving collaboration and increasing the reliability of their scientific results.

Target audience and assumed background

This course is intended for researchers, data scientists, and anyone who uses R to generate documents and analyses and who wants to collaborate with others—or with their future self—with minimal friction.

Basic prior experience with R is recommended. If you have ever imported data and produced a graph or table from it, you have the necessary background to participate.

Learning outcomes

By the end of this course, participants will be able to:

Organise analyses using RStudio projects with a clear folder structure and, optionally, an R package layout.
Create dynamic, reproducible documents with Quarto, including using templates and LaTeX formatting.
Track changes and collaborate effectively using Git and GitHub, including pull requests, forks, and repository documentation.
Structure analyses as research compendia, combining code, data, and documentation in a reproducible framework.
Manage R package dependencies with renv to ensure reproducibility across machines and over time.
Share data reliably, including using DOI-linked repositories and programmatic access to datasets.
Encapsulate computational environments using Docker, allowing analyses to be reproduced consistently across different systems.
Integrate all elements into a complete reproducible workflow, from project setup to version-controlled, containerised analyses ready for sharing and collaboration.

Program

Daily schedule

9:00 – 12:00 (Berlin time): live lectures, live coding, and hands-on exercises.

Participants will also receive asynchronous homework support via Slack.

day1 — Classes from 9:00–12:00 (Berlin time)

Introduction to reproducibility
RStudio projects
- Folder structure
- R package structure
Quarto
- Syntax
- Using templates
- Using LaTeX templates

day2— Classes from 9:00–12:00 (Berlin time)

here package
Git and GitHub
- Setup and basic ideas
- Basic workflow (add, commit)
- Collaboration (forks, pull requests)
- Repository documentation (README, License, Code of Conduct)
Research compendia

day3 — Classes from 9:00–12:00 (Berlin time)

Managing dependencies with renv
Sharing data
- Data repositories and persistent identifiers (DOIs)
- Accessing data directly from code

day4 — Classes from 9:00–12:00 (Berlin time)

Introduction to containers
Docker
- Creating a container with a Dockerfile
- Docker + renv
- Publishing containers on Docker Hub
Putting it all together

Instructors

Paola Corrales: Paola has a PhD in Atmospheric Science from Universidad de Buenos Aires. During her PhD she applied data assimilation techniques to improve the representation of mesoscale convective systems and associated precipitation. She has experience working with Numerical Weather Prediction models using HPC systems and programming languages such as R, bash, and Fortran. She is an active R user and developer and contributes to many communities of practice, such as R-Ladies and rOpenSci. Since 2021, Paola holds a professor position at Universidad Nacional Guillermo Brown where she teaches Visualization of Information, and Data Management. In 2023 Paola became a member of The Carpentries Board of Directors.

More information about Paola: https://paocorrales.github.io

Elio Campitelli: Elio Campitelli has a Ph.D. from Universidad de Buenos Aires in atmospheric sciences, where they studied the large-scale circulation of the Southern Hemisphere and now studies tropical influences on Antarctic sea ice at Monash University. They also taught Introduction to Programming, and Visualization of Information at Universidad Nacional Guillermo Brown and is a The Carpentries certified instructor. They are an active member of the R community, and maintains several open-source R packages (e.g., ggnewscale; metR).

More information about Elio: https://eliocamp.github.io

Cost overview

Package 1

450 €

Register now

what people say about this course - 3rd edition

"The course was very enjoyable and practical. I especially appreciated learning the basics of working in a reproducible environment with renv and git in R. The tips and resources provided will be very useful as I continue to expand and adapt my workflow."

"I found the sessions on automating documentation for tools and papers extremely helpful. The course gave me a solid foundation in reproducible research practices and provided concrete strategies I can implement in my own projects."

related courses

1 - Dealing with messy data in R - ONLINE, 8-10 April

2- Beyond Beginner R - ONLINE, 1-4 June

Should you have any further questions, please send an email to [email protected]

Cancellation Policy:

> 30 days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.