Introduction to Machine Learning with R

Dates

17-21 February 2025

To foster international participation, this course will be held online

 

Course Overview

The use of modern quantitative technologies to characterise complex phenomena represents the standard approach in almost every research domain. Biology makes no exception and the use of multi-omics techniques (metabolomics, transcriptomics, genomics and proteomics) is pervasive in every facet of life sciences. The resulting multivariate datasets are highly complex and advanced data analysis approaches need to be applied to optimize the use of the available information. For relatively large-scale studies, machine learning (ML) represents a valid tool to complement classical multivariate statistical methods.
The objective of this course is to highlight the advantages and limitations of these data analysis approaches in the context of biological research, providing a broad hands-on introduction to the use of multivariate methods and machine learning algorithms for the analysis of ‘omics datasets.


Targeted audience & ASSUMED BACKGROUND

The syllabus has been planned for people who need an intuitive starter on the basic knowledge of theoretical and applied machine learning. Students are preferred -but not required- to have a foundational understanding of statistics and the R programming language.

 

TEACHING FORMAT

Each session consists of a lecture of one-to-two hours followed by one-to-two hours of practical exercises/demonstrations. There will also be plenty of time for students to discuss their problems and data.

Program

Day 1 - 2-8 pm Berlin time


General Introduction
Data mining, -omics and machine learning
Hands-off introduction to ML / Omics meet ML
Introduction to advanced R data libraries
Introduction to tidymodels

Day 2- 2-8 pm Berlin time


Multivariate data: things to always remember
Model and variable selection: the machine learning paradigm
Supervised learning: regression and classification
Machine learning for regression problems

Day 3 - 2-8 pm Berlin time


Overfitting and resampling techniques
Classification problems
Regression and classification with tidymodels
Lasso-penalised linear and logistic regression
Lasso and model tuning
KNN imputation [optional]

 


Day 4- 2-8 pm Berlin time


Random Forest for regression and classification
Slow learning: the boosting approach
Unsupervised learning: PCA, Umap, Self-organizing maps
PCA demo

 


Day 5- 2-8 pm Berlin time


SVM demo
Unsupervised learning demos
UMAP demo
SOM demo
Final interactive exercise
Kahoot quiz: let’s test our machine learning skills!
Q&A

 

Instructors

 
Dr. Pietro Franceschi

 

 

COst overview

Package 1

 

 

                     530 €


Cancellation Policy:

 

 

 

> 30  days before the start date = 30% cancellation fee

 

< 30 days before the start date= No Refund.

 

 

 

Physalia-courses cannot be held responsible for any travel fees, accommodation or other expenses incurred to you as a result of the cancellation.