e-learning

Foundational Aspects of Machine Learning using Python

Abstract

Machine Learning is a subset of artificial intelligence that involves training algorithms to learn patterns from data and make predictions or decisions without being explicitly programmed. It has revolutionized various fields, from healthcare and finance to autonomous vehicles and natural language processing.

About This Material

This is a Hands-on Tutorial from the GTN which is usable either for individual self-study, or as a teaching material in a classroom.

Questions this will address

  • How can we use Machine-Learning to make more generalizable models?
  • What are the key components of a supervised learning problem, and how do they influence model performance?
  • How do classification and regression tasks differ in supervised learning, and what types of models are suitable for each?
  • What strategies can we employ to ensure our Machine Learning models generalize well to unseen data?
  • How can we use Machine Learning to make more generalizable models that perform well on diverse datasets?
  • What are some practical steps for applying Machine Learning to real-world datasets, such as the transcriptomics dataset for predicting potato coloration?

Learning Objectives

  • Understand and apply the general syntax and functions of the scikit-learn library to implement basic Machine Learning models in Python.
  • Identify and explain the concepts of overfitting and underfitting in Machine Learning models, and discuss their implications on model performance.
  • Analyze the need for regularization techniques and justify their importance in preventing overfitting and improving model generalization.
  • Evaluate the effectiveness of cross-validation and test sets in assessing model performance and implement these techniques using scikit-learn.
  • Compare different evaluation metrics and select appropriate metrics for imbalanced datasets, ensuring accurate and meaningful model assessment.

Licence: Creative Commons Attribution 4.0 International

Keywords: Statistics and machine learning, ai-ml, elixir, jupyter-notebook

Target audience: Students

Resource type: e-learning

Version: 3

Status: Active

Prerequisites:

  • Introduction to Python
  • Python - Warm-up for statistics and machine learning

Learning objectives:

  • Understand and apply the general syntax and functions of the scikit-learn library to implement basic Machine Learning models in Python.
  • Identify and explain the concepts of overfitting and underfitting in Machine Learning models, and discuss their implications on model performance.
  • Analyze the need for regularization techniques and justify their importance in preventing overfitting and improving model generalization.
  • Evaluate the effectiveness of cross-validation and test sets in assessing model performance and implement these techniques using scikit-learn.
  • Compare different evaluation metrics and select appropriate metrics for imbalanced datasets, ensuring accurate and meaningful model assessment.

Date modified: 2025-05-19

Date published: 2025-03-11

Authors: Wandrille Duchemin

Contributors: Anup Kumar, Bérénice Batut, Saskia Hiltemann, Wandrille Duchemin

Scientific topics: Statistics and probability


Activity log