Handling Missing Data in R

Dates

22-24 April 2026

To foster international participation, this course will be held online

 

Course overview

Real-world datasets are rarely complete: missing values are a pervasive challenge that can bias estimates, reduce statistical power, and invalidate results if handled improperly. This course provides a practical and principled framework for diagnosing and handling missing data using R.

Participants will begin by learning the theoretical mechanisms underlying missingness—Missing Completely At Random (MCAR), Missing At Random (MAR), and Missing Not At Random (MNAR)—and how these mechanisms influence the choice of analytical strategy. These concepts are paired with descriptive and visual diagnostic tools to help participants understand the structure and patterns of missing data in real datasets.

The course then introduces traditional approaches such as listwise deletion and simple imputation, critically discussing their assumptions and limitations. In the final part, participants are guided through modern and advanced methods, including regression imputation, multiple imputation, and iterative algorithms for data completion such as Soft Impute and the Expectation–Maximization (EM) algorithm. Throughout the course, emphasis is placed on practical implementation, interpretation, and best practices.

Learning outcomes

By the end of the course, participants will be able to:

  • Identify missingness mechanisms
    Distinguish between MCAR, MAR, and MNAR mechanisms and understand their implications for statistical analysis.

  • Diagnose missing data patterns
    Use R to explore, summarize, and visualize missing data through descriptive statistics and graphical tools.

  • Apply traditional methods appropriately
    Implement standard techniques such as listwise deletion, pairwise deletion, and simple imputation, while understanding their limitations and potential biases.

  • Implement modern imputation strategies
    Apply regression imputation and multiple imputation methods (e.g. MICE) in R and correctly interpret results.

  • Understand advanced algorithms
    Grasp the logic and use cases of iterative data completion approaches, including Soft Impute and Expectation–Maximization algorithms.

  • Choose suitable methods in practice
    Select and justify appropriate missing data strategies based on data structure, missingness mechanism, and analytical goals.

 

Session content

Day 1 – Foundations and Missingness Mechanisms  22 April | 3-6 PM Berlin time

 

1. Introduction

 

  • Definition of missing data in real-world datasets

  • Why missing data matter: statistical bias, loss of power, and validity

  • Overview of missing data mechanisms

    • Missing Completely At Random (MCAR)

    • Missing At Random (MAR)

    • Missing Not At Random (MNAR)

  • Practical examples illustrating each mechanism

  • Discussion: consequences of choosing the wrong strategy

Day 2 – Diagnostics, Visualization, and Traditional Methods - 23 April | 3-6 PM Berlin time

 

2. Diagnostics and Visualization

 

  • Descriptive statistics

    • Counts and proportions of missing values

  • Visualization of missing data patterns

    • Location and co-occurrence of missing values

  • The Shadow Matrix

    • Binary indicators for missingness

    • Exploring relationships between missing and observed data

  • Correlation of missingness

 

3. Traditional Methods

 

  • Deletion methods

    • Listwise Deletion (Complete Case Analysis)

    • Pairwise Deletion

  • Simple imputation methods

    • Mean and median imputation

    • Biases and limitations

  • When (and when not) to use traditional approaches

Day 3 – Modern and Advanced Imputation Approaches- 24 April | 3-6 PM Berlin time

 

4. Introduction to Modern Approaches

 

  • Regression imputation

    • Concept and implementation in R

    • Strengths and weaknesses

  • Multiple Imputation

    • Rationale and statistical intuition

    • Multiple Imputation by Chained Equations (MICE)

    • Interpreting and pooling results

  • Iterative and algorithmic approaches

    • Soft Impute algorithm

    • Expectation–Maximization (EM) algorithm

  • Choosing an appropriate method based on the missingness mechanism

  • Final discussion and best-practice recommendations

 

Instructor

 

Dr. Luca Brusa  - University of Milano-Bicocca (Italy)

 


COst overview

 

Course 

 

 

300 €

 


related courses

1 - Introduction to Quarto , ONLINE  5-6 February

 

2- Dealing with messy data in R - ONLINE, 8-10 April

 

3 - Beyond Beginner R - ONLINE, 1-4 June

 

4 - Introduction to R Shiny - ONLINE, 9-10 June

 

Cancellation Policy:

 

> 30  days before the start date = 30% cancellation fee

< 30 days before the start date= No Refund.

 

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.