hands-on tutorial

FAIRification of an RNAseq dataset

RNA sequencing is chosen here as an example of how to FAIRify data for a popular assay in the Life Sciences. RNAseq data can be shared and curated in designated public repositories using established ontologies (and controlled vocabularies) for describing protocols and biological material (metadata).

Two international repositories are commonly used to locate and download RNAseq (meta)data: ArrayExpress and GEO. Other repositories for raw sequence data exist (e.g. SRA, ENA, DDBJ), but ArrayExpress and GEO specifically house and index expression data , including rich metadata detailing samples, data processing and final results files such as gene expression matrices.

By submitting data to a public repository, it becomes openly accessible, searchable and annotated with rich metadata, by the submitter and curation team. Note, both repositories belong to the FAIRsharing database registry, which can help you find public repositories for all types of Life Science data.

This lesson will take you through a publicly available RNAseq dataset in ArrayExpress and show you how it meets FAIR principles using the checklist published in 2016 Wilkinson et al. 2016.

DOI: https://gxy.io/GTN:T00435

Licence: Creative Commons Attribution 4.0 International

Keywords: FAIR, clinical data

Resource type: hands-on tutorial

Version: 7

Status: Active


FAIR and its Origins
Data Registration
Persistent Identifiers

Learning objectives:

To be able to map each of the FAIR principles to a dataset in the public domain

Date created: 2024-05-26

Date modified: 2024-05-26

Date published: 2024-05-27

Authors: Robert Andrews, Andrew Mason, Sara Morsy, Philippe Rocca-Serra, Xenia Perez Sitja, Branka Franicevic, Katarzyna Kamieniecka, Khaled Jum'ah, Krzysztof Poterlowicz

Activity log