View event

Date: 8 January 2026 @ 13:00 - 16:00

Timezone: Brussels

Duration: 3 hours

Language of instruction: English

Loading map...

This workshop will build on the half-day workshop "Building Scalable and Maintainable Data Pipelines with Omnipy (Part 1 - Beginner level) we are holding before lunch.

In this second workshop, participants will learn how to develop various types of data flows in Omnipy, including integration with web services. They will make use of the powerful industry-developed Prefect orchestration engine to scale up the game and deploy high-throughput ETL flows using external compute resources.

The workshop is divided into three parts:

The first part will introduce the slogan "parse, don't validate" and show how these concepts are implemented in Omnipy. On this background, we will introduce the three types of data flows supported by Omnipy: linear, DAG, and function flows. We will also, through hands-on examples, show how to make use of various job modifiers to power up and customise predefined tasks and flows to construct more complex data flows.
The second part will focus on integrating data flows with web services through REST APIs. We will mainly focus on extracting data from data sources, but will also touch upon loading results onto data sinks. Hands-on examples will introduce tasks and flows that allow flattening of JSON data into relational tabular form for mapping, and then restructuring the results back to JSON.
The last part will introduce Omnipy's integration with S3-based cloud storage and the Prefect ETL orchestration library. As a hands-on exercise, the participant will scale up the data flow developed in the second part of the workshop by deploying it on an external compute infrastructure, potentially the Kubernetes-based NIRD Toolkit from SIGMA2 (if Prefect-integration in NIRD is finalised in time for the workshop).

Contact: digitalscholarship@ub.uio.no

Venue: Moltke Moes vei, Moltke Moes vei

City: Oslo

Region: Oslo kommune

Country: Norway

Postcode: 0851

Prerequisites:

The participant should have some experience with Python programming/scripting. We will not spend time explaining basic syntax and concepts, other than what is related to type hints. Experience with type hints in Python is useful, but not required.
Laptop.
No software installation is required other than a modern browser.
We will make use of JupyterLab for the hands-on exercise. An online JupyterLab service will be made available, but participants can also install JupyterLab locally on their laptops if they prefer.

Learning objectives:

Introduction to Python type hints and pydantic models
How to use type hints to define models, datasets and tasks in Omnipy
How to wrangle a rough dataset into the shape required by a metadata schema
How to set up an executable mapping of data from one metadata schema to another

Organizer: The workshop is provided by the Oslo node of ELIXIR Norway as part of an extended event organised by Digital Scholarship Center (DSC), Carpentry@UiO, IFI (UiO), dScience, NAIC, Humit, PSI, ISS, USIT, NORRN, CodeRefinery, ELIXIR Oslo, University of Oslo Library and BærUt!

Host institutions: University of Oslo

Target audience: PhD, Postdoctoral Fellows, Researchers, Engineers

Capacity: 20

Event types:

Workshops and courses

Cost basis: Free to all

Sponsors: Digital Scholarship Centre (University of Oslo)

Scientific topics: Data curation and archival, Data identity and mapping, Data quality management, Data governance, Workflows

Operations: Data handling

External resources:

omnipy

Activity log

Content provider

Node

Building Scalable and Maintainable Data Pipelines with Omnipy (Part 2)

omnipy