Session content

Monday – Classes from 09:30 to 17:30





After briefly introducing R and installing everything we need, this first day will be dedicated to learning how to use RStudio efficiently. RStudio is the most popular interface for R. I will explain how to use RStudio to set up the right options for R, install and update R packages, explore the R documentation, create projects to organise one’s work, write notebooks that keep track of the workflow, and more.





Tuesday and Wednesday – Classes from 09:30 to 17:30


 DATA days



R is software dedicated to data analysis, so mastering data manipulation in R is essential. I will explain how to import different kinds of data into R, how to view the data, how to perform basic data editing (e.g. modifying values), but also how to completely reshape datasets (subsetting, merging, aggregating, pivoting) to make them suitable for plotting and analyses. Knowing how to do this in R will save participants countless hours of dangerous fiddling in Excel, with the added benefits that doing that in R makes all transformations documented and modifiable. I will illustrate these procedures using some R packages belonging to the tidyverse (readr, dplyr, tidyr, stringr, forcats). The tidyverse is a very popular ecosystem of modern R packages specifically designed to allow users to perform complex data manipulation efficiently, without requiring complex programming.





Thursday – Classes from 09:30 to 17:30





Plotting is a crucial part of any data exploration and analysis. It is important to visualise the data before an analysis (e.g. to visually check the presence of potential errors and to get a sense of the distribution of the data), during the analysis (e.g. to check the distribution of model residuals in a linear model), and after the analysis (to communicate findings in the most efficient way). Therefore, knowing how to plot various kinds of data matters a lot. I will show how to plot different types of data in R with a particular emphasis on ggplot2, the graphic environment of the tidyverse. I will explain how to tinker with the plots so that R produces plots that meet the quality standards of the most demanding publishing platforms out there.



Friday – Classes from 09:30 to 17:30





The chief focus of R is the ability to analyse data. R readily contains many widely used statistical methods (chi-square test, t-test, linear models...) and cutting-edge methods such as mixed models and machine learning algorithms are also available by means of R packages. After briefly introducing some basic tests, I will explain how to analyse real data by means of linear models – the most widely used statistical toolbox. This session will give the opportunity for participants to put all their new tricks into practice to implement a full workflow starting from importing data and ending by integrating the results of a statistical analysis into tables and plots ready for submission.