e-learning
Generating a single cell matrix using Alevin
Abstract
This tutorial will take you from raw FASTQ files to a cell x gene data matrix in AnnData format. What's a data matrix, and what's AnnData format? Well you'll find out! Importantly, this is the first step in processing single cell data in order to start analysing it. Currently you have a bunch of strings of ATGGGCTT
etc. in your sequencing files, and what you need to know is how many cells you have and what genes appear in those cells. These steps are the most computationally heavy in the single cell world, as you're starting with 100s of millions of reads, each with 4 lines of text. Later on in analysis, this data becomes simple gene counts such as 'Cell A has 4 GAPDHs', which is a lot easier to store! Because of this data overload, we have downsampled the FASTQ files to speed up the analysis a bit. Saying that, you're still having to map loads of reads to the massive murine genome, so get yourself a cup of coffee and prepare to analyse!
About This Material
This is a Hands-on Tutorial from the GTN which is usable either for individual self-study, or as a teaching material in a classroom.
Questions this will address
- I have some single cell FASTQ files I want to analyse. Where do I start?
Learning Objectives
- Generate a cellxgene matrix for droplet-based single cell sequencing data
- Interpret quality control (QC) plots to make informed decisions on cell thresholds
- Find relevant information in GTF files for the particulars of their study, and include this in data matrix metadata
Licence: Creative Commons Attribution 4.0 International
Keywords: 10x, MIGHTS, Single Cell, paper-replication
Target audience: Students
Resource type: e-learning
Version: 19
Status: Active
Prerequisites:
- An introduction to scRNA-seq data analysis
- Introduction to Galaxy Analyses
- Understanding Barcodes
Learning objectives:
- Generate a cellxgene matrix for droplet-based single cell sequencing data
- Interpret quality control (QC) plots to make informed decisions on cell thresholds
- Find relevant information in GTF files for the particulars of their study, and include this in data matrix metadata
Date modified: 2024-10-28
Date published: 2021-03-03
Contributors: Beatriz Serrano-Solano, Björn Grüning, David López, Helena Rasche, Jonathan Manning, Julia Jakiela, Marisa Loach, Mehmet Tekman, Pavankumar Videm, Saskia Hiltemann, Teresa Müller, Wendi Bacon
Scientific topics: Transcriptomics
Activity log