Handling Missing Data in R

Handling Missing Data in R

Dates

22-24 April 2026

To foster international participation, this course will be held online

Course overview

Real-world datasets are rarely complete: missing values are a pervasive challenge that can bias estimates, reduce statistical power, and invalidate results if handled improperly. This course provides a practical and principled framework for diagnosing and handling missing data using R.

Participants will begin by learning the theoretical mechanisms underlying missingness—Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR)—and how these mechanisms influence the choice of analytical strategy. These concepts are paired with descriptive and visual diagnostic tools to help participants understand the structure and patterns of missing data in real datasets.

The course then introduces traditional approaches such as listwise deletion and simple imputation, critically discussing their assumptions and limitations. In the final part, participants are guided through modern and advanced methods, including regression imputation, multiple imputation, and iterative algorithms for data completion such as Soft Impute and the Expectation–Maximization (EM) algorithm. Throughout the course, emphasis is placed on practical implementation, interpretation, and best practices.

Learning outcomes

By the end of the course, participants will be able to:

Identify missingness mechanisms
Distinguish between MCAR, MAR, and MNAR mechanisms and understand their implications for statistical analysis.
Diagnose missing data patterns
Use R to explore, summarize, and visualize missing data through descriptive statistics and graphical tools.
Apply traditional methods appropriately
Implement standard techniques such as listwise deletion, pairwise deletion, and simple imputation, while understanding their limitations and potential biases.
Implement modern imputation strategies
Apply regression imputation and multiple imputation methods (e.g. MICE) in R and correctly interpret results.
Understand advanced algorithms
Grasp the logic and use cases of iterative data completion approaches, including Soft Impute and Expectation–Maximization algorithms.
Choose suitable methods in practice
Select and justify appropriate missing data strategies based on data structure, missingness mechanism, and analytical goals.

Session content

Day 1 – Foundations and Missingness Mechanisms - 22 April | 3-6 PM Berlin time

1. Introduction

Definition of missing data in real-world datasets
Why missing data matter: statistical bias, loss of power, and validity
Overview of missing data mechanisms
- Missing Completely At Random (MCAR)
- Missing At Random (MAR)
- Missing Not At Random (MNAR)
Practical examples illustrating each mechanism
Discussion: consequences of choosing the wrong strategy

Day 2 – Diagnostics, Visualization, and Traditional Methods - 23 April | 3-6 PM Berlin time

2. Diagnostics and Visualization

Descriptive statistics
- Counts and proportions of missing values
Visualization of missing data patterns
- Location and co-occurrence of missing values
The Shadow Matrix
- Binary indicators for missingness
- Exploring relationships between missing and observed data
Correlation of missingness

3. Traditional Methods

Deletion methods
- Listwise Deletion (Complete Case Analysis)
- Pairwise Deletion
Simple imputation methods
- Mean and median imputation
- Biases and limitations
When (and when not) to use traditional approaches

Day 3 – Modern and Advanced Imputation Approaches- 24 April | 3-6 PM Berlin time

4. Introduction to Modern Approaches

Regression imputation
- Concept and implementation in R
- Strengths and weaknesses
Multiple Imputation
- Rationale and statistical intuition
- Multiple Imputation by Chained Equations (MICE)
- Interpreting and pooling results
Iterative and algorithmic approaches
- Soft Impute algorithm
- Expectation–Maximization (EM) algorithm
Choosing an appropriate method based on the missingness mechanism
Final discussion and best-practice recommendations

Instructor

Dr. Luca Brusa - University of Milano-Bicocca (Italy)

COst overview

Course

300 €

Register now

related courses

1 - Introduction to Quarto , ONLINE 5-6 February

2- Dealing with messy data in R - ONLINE, 8-10 April

3 - Beyond Beginner R - ONLINE, 1-4 June

4 - Introduction to R Shiny - ONLINE, 9-10 June

Cancellation Policy:

> 30 days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.