Using Omnipy for data wrangling and metadata mapping
Date: 14 December 2023 @ 09:00 - 12:00
Duration: 3 hours
Researchers in the life sciences often spend a significant amount of time on data-wrangling tasks, such as reformatting, cleaning, and integrating data from different sources. Despite the availability of software tools, they often end up with difficult-to-reuse workflows that require manual steps. Omnipy is a new Python library that offers a systematic and scalable approach to research data and metadata wrangling. It allows researchers to import data in various formats and continuously reshape it through typed transformations. For large datasets, Omnipy seamlessly scales up local test jobs and provides persistent access to the data state at every step. This workshop will provide down-to-earth tutorials and examples to help data scientists in the life sciences make use of Omnipy to wrangle real-world datasets into shape. The workshop is divided into three parts:
The first part will introduce the concepts of models, datasets, tasks and flows in Omnipy through small examples. We will also touch upon Python type hints and Pydantic models as needed, as these are important building blocks for Omnipy.
In the second part, the participants will be provided with a rough example dataset that requires cleaning. As a hands-on exercise, the participant will carry out step-wise parsing and shaping of the data to make it comply with a specified metadata schema.
In the last part, the participants will be introduced to the metadata mapping functionalities in Omnipy and led through another hands-on exercise to set up a transformation that maps the data from one metadata schema to another.
Venue: Ole-Johan Dahls hus, 23B Gaustadalléen
Region: Oslo kommune
The participants should have some experience with Python programming/scripting. We will not spend time explaining basic syntax and concepts, other than what is related to type hints. Experience with type hints in Python is useful, but not required.
- Introduction to Python type hints and Pydantic model
- How to use type hints to define models, datasets, tasks and flows in Omnipy
- How to wrangle a rough dataset into the shape required by a metadata schema
- How to set up an executable mapping of data from one metadata schema to another
Organizer: The workshop is provided by ELIXIR Oslo as part of an extended event organised by the Student Committee of the Centre for Bioinformatics at the University of Oslo in collaboration with the ISCB Regional Student group in Norway
Host institutions: University of Oslo
Target audience: PhD, Postdoctoral Fellows, Researchers, Engineers
- Workshops and courses
Cost basis: Free to all
Scientific topics: Data submission, annotation, and curation, Data identity and mapping, Data quality management, Data governance, Workflows
Operations: Data handlingExternal resources: