e-learning

Optimizing DNA Sequences for Biological Functions using a DNA LLM

Abstract

Prepare resources

About This Material

This is a Hands-on Tutorial from the GTN which is usable either for individual self-study, or as a teaching material in a classroom.

Questions this will address

  • to do

Learning Objectives

  • pretraining LLM for DNA
  • finetuning LLM
  • zeroshot prediction for DNA variants and synthetic DNA sequence generation.

Licence: Creative Commons Attribution 4.0 International

Keywords: AI & ML, ELIXIR, Large Language Model, Statistics and machine learning, jupyter-notebook, work-in-progress

Target audience: Students

Resource type: e-learning

Version: 1

Status: Draft

Prerequisites:

  • Deep Learning (without Generative Artificial Intelligence) using Python
  • Fine-tuning a LLM for DNA Sequence Classification
  • Foundational Aspects of Machine Learning using Python
  • Generating Artificial Yeast DNA Sequences using a DNA LLM
  • Introduction to Python
  • Neural networks using Python
  • Predicting Mutation Impact with Zero-shot Learning using a pretrained DNA LLM
  • Pretraining a Large Language Model (LLM) from Scratch on DNA Sequences
  • Python - Warm-up for statistics and machine learning

Learning objectives:

  • pretraining LLM for DNA
  • finetuning LLM
  • zeroshot prediction for DNA variants and synthetic DNA sequence generation.

Date modified: 2025-04-17

Date published: 2025-04-17

Authors: Bérénice Batut, Raphael Mourad

Contributors: Anup Kumar, Björn Grüning, Bérénice Batut, olisand

Scientific topics: Statistics and probability


Activity log