Posted on June 26th 2020 by Carlo Pecoraro
Biology and medicine studies are nowadays producing an incredible amount of data, which are opening new opportunities, but also pose new challenges. In this context, Deep Learning (DL), which is a set of techniques that permits machines to predict outputs from a layered set of inputs, is always more used across several fields.
Here we have the possibility to chat about this fast-growing field and its application in biology with our instructors Dr. Filippo Biscarini and Dr. Nelson Nazzicari of the course “INTRODUCTION TO DEEP LEARNING FOR BIOLOGISTS” in September.
1) When did you start using DL in your work?
F: Machine and Statistical learning have been a large part of my professional work for almost ten years now, mainly for applications in genomics and other ‘omics data in regression, classification and clustering problems. Deep Learning is a relatively recent addition to my toolbox, and I am currently applying it to the analysis of unstructured text data and image recognition (e.g. pdf files).
N: I started using Machine Learning in my professional work about ten years ago, solving clustering and classification problems in fields completely unrelated to biology :) I developed a specific interest with deep learning in the last three years thanks to a project on classification of field images acquired via drones to predict the growth of weed.
2) Let’s talk about your course: can you please describe a bit its structure and main outcomes for the participants?
We’ll start by showing the “black box”: a working deep learning model for image recognition. We will then analyse in detail how this works and what the building blocks of deep learning are. Once familiarised with the basics of deep learning, we will apply it to regression and classification problems with biological datasets, allowing participants to practice their skills with properly set-up hands-on sessions. Lastly, we will revise some state-of-the-art applications of deep learning and introduce participants to some “tricks” that can inspire their own applications (e.g. using pre-trainedpretrained deep learning models).
3) The course will focus on the application of Convolutional Neural Network (CNN)– could you simply explain to us what CNN is?
A CNN is first of all a type of Neural Network, i.e. a mathematical model which is “inspired” by the biological structures present in animal brains. A Convolutional NN performs a specific operation - that is, convolution - to extract pieces of spatial information from an image. It’s a multi step process and at each step the information becomes more refined. It starts from the raw image pixels, then goes to lines and corners, then squares and circles, then rough shapes until it reaches a level of understanding that is almost human-like, being able to pinpoint the presence of things like cars, dogs, planes and people in the image.
The same learning procedure can be applied to biological images to locate the position of structures (e.g. tissues) or to classify images (e.g. for the presence/absence of tumors). Being extremely flexible tools, it is also possible to feed Neural Networks with input data which are not images (e.g. -omics data) and still obtain accurate predictions.
4) What are the main steps of a typical DL study?
A typical Deep Learning project includes several steps, the main being: i) data preprocessing (cleaning, filtering and transforming the input data); ii) finding the appropriate data representation (e.g. number of pixels and channels for image data, encoding of categorical variables like one-hot encoding or feature hashing); iii) model building (choosing the neural network architecture, the number of layers, the activation functions); iv) configuring parameters (e.g. number of epochs, batch size); v) training the DL model; vi) evaluating the trained DL model (measuring the performance of the model)
5) Is there a specific reason why we will use Python in this course? Or could we eventually do the same analyses using R?
We chose Python based on our previous experience, but there are excellent deep learning implementations also in R. For instance, participants can have a look at Keras for R (https://keras.rstudio.com/). Keras is a very popular framework for deep learning which is built on top of TensorFlow, a core library for deep learning. Keras implementations both in Python and R exist.
6) Can you briefly explain to usexplain us how the hands-on sessions of this course will be structured and which infrastructure we will be using for this?
A typical hands-on session starts with a brief explanation of the exercise we are going to study. The students will then open a python notebook, which is a document that alternates python code with regular text. We’ll execute the code together and comment on the results. As the course progresses the students will be encouraged to tinker with the code and, depending on their level of familiarity with python, write code on their own.
We are currently considering two online infrastructures, namely Google Colab (https://colab.research.google.com/) and Paperspace Gradient (https://gradient.paperspace.com/), which will allow to execute deep-learning algorithms via an online interface - and avoid the students the hassle of installing all the required software.