Dealing with messy data in R: data cleaning and visualisation

Dates

13-15 July 2026

To foster international participation, this course will be held online

 

Course overview

The course focuses on practical techniques to load, manipulate, and visually explore messy tabular data with tidyverse and ggplot2. Participants will learn core principles behind the grammar of graphics and how to produce clear and report-ready visual representations of data. This course equips attendees to confidently work with real-world datasets.

 

Target Audience

This course is aimed at a mixed audience, including early-career researchers, practitioners, and analysts who have prior experience with R but may be new to the tidyverse or data visualization in R. It is intended for those eager to deepen their data handling and graphing skills, especially for messy or complex datasets. 
No prior knowledge of tidyverse or ggplot2 is required.

 

Learning outcomes

By the end of this course, participants will be able to:

  • Load, inspect, and clean messy datasets using tidyverse functions.
  • Combine and join multiple tables to prepare data for analysis.
  • Understand and apply the grammar of graphics via ggplot2 to create a variety of plots (points, lines, bars, boxes).
  • Identify and select the best type of plot for different data types and analytical questions.
  • Customise plot aesthetics such as colours, scales, and themes for effective storytelling, with attention to accessibility (e.g., choosing colourblind-friendly palettes).
  • Export and save visualisations suitable for presentations, reports, or publications.

Session content

Day 1 — 2-5 PM (Berlin time)
Session 1: Introduction to the tidyverse and Data Import

Introduction to the tidyverse collection of packages and the core concepts of tidy data. Participants will learn how to import different data formats, inspect datasets for inconsistencies, and begin cleaning workflows through practical examples.


Session 2: Data Transformation and Cleaning with dplyr and tidyr
Hands-on use of dplyr and tidyr to filter, manipulate, and transform data. Topics include handling missing values, recoding variables, converting data between wide and long formats, and preparing datasets for downstream analyses.


Day 2 — 2-5 PM (Berlin time)
Session 3: Introduction to Data Visualisation with ggplot2
Understanding the grammar of ggplot2 and building essential plots such as scatterplots, line charts, bar graphs, and boxplots starting from raw data. Introduction to plot layering and basic customisation.


Session 4: Advanced Visualisation Techniques with ggplot2
In-depth work on enhancing visualisations using custom colour schemes, scales, themes, labels, and annotations. Guidance on producing accessible and publication-quality graphics, including exporting figures for reports and scientific publications.


Session 5: Merging and Integrating Multiple Datasets
Learning techniques to merge and join multiple datasets using functions such as left_join(), and preparing integrated datasets for further analyses.


Day 3 — 2-5 PM (Berlin time)
Session 6: Multi-Panel Figures and Workflow Case Studies
Arranging multiple plots into cohesive figures using packages such as patchwork and cowplot. Real-world case studies demonstrating complete workflows from messy data to polished visualisations.


Session 7: Integrating AI Tools into R — Final Q&A and Troubleshooting
Introduction to integrating AI-assisted tools into R workflows, followed by an open session for questions, troubleshooting, and discussion


COst overview

 

Package 1

 

 

 

350 €

 

 


related courses


Handling Missing Data in R - ONLINE, 22-24 April

 

 - Beyond Beginner R - ONLINE, 1-4 June

 

Introduction to R Shiny - ONLINE, 9-10 June

 

Cancellation Policy:

 

> 30  days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

 

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.