Organizer: de.NBI

Start: Sunday, 21 July 2019 @ 09:00

End: Sunday, 21 July 2019 @ 17:00

Venue: Basel

City: Basel

County: Basel-Stadt

Country: Switzerland

Description:

Educators:
Bjoern Grüning (RBC), Johannes Köster, Devon Ryan

Date:
21.07.2019

Location:
ISMB/ECCB Basel

Content:
The typical data analyst must simultaneously juggle multiple projects, each having its own duration and software requirements. As few analysts have any formal training on structuring or even writing the code necessary to perform an analysis, it is unsurprising that the iterative analytic process can produce a wide assortment of almost identically named files (e.g., “finalresults.txt”, “finalresults.version2.txt”, “finalresults.reallyfinal.txt”), all with unclear origins and produced with a hodge-podge of similarly poorly named scripts. The near impossibility of tracing a results file to the exact process that produced it creates untold difficulties both when it comes time to publish results as well as when planning subsequent experiments months or years later (afterall, which of the “final_results” files was really the “right one”?). These issues are further compounded by software paths and other similar assumptions being hard-coded into scripts, preventing easy analysis replication elsewhere. Performing analyses in a reproducible and traceable manner is clearly needed to combat such problems.

Schedule Overview

2:00 - 2:10 pm Installing conda and snakeMake
2:10 - 2:30 pm Intro to conda and bioconda (slides)
2:30 - 3:30 pm Hands-on Session: creating conda envs and installing packages from bioconda repo

This practical would require installing hisat, samtools and deeptools via bioconda

3:30 - 4:00 pm Hands-on Session: writing conda recipes

Topics in BioVis (including examples)
Visualization of sequences, macromolecules, omics data, biological networks

4:00 - 4:15 am Coffee Break
4:15 - 4:35 pm Intro to snakemake

Specific tools for visualizing large-scale biological data

4:35 - 6:00 Hands On Session: Writing a snakemake workflow wrapper for mapping, indexing and creating coverage files

Learning goals:
In this hands-on tutorial, we demonstrate how Conda can be used to deploy specific software versions easily, reproducibly, and without administrator credentials. Moreover, we demonstrate how Conda’s ability to create isolated software environments helps to avoid side-effects between different analyses or different steps of the same analysis. Attendees will also learn how to create conda recipes themselves, so they can contribute new packages to projects such as Bioconda. We further demonstrate how Snakemake can be used in combination with Conda and Containers to create reproducible analysis workflows and execute them on any platform from workstations to clusters and the cloud. Finally, using snakePipes as an example, we demonstrate how Conda and Snakemake can be used to define reproducible and flexible workflows for complex genomics analysis.

Prerequisites:
- Laptops with Linux or MacOS
- Pre-installed Miniconda - install via miniconda : https://conda.io/miniconda.html
- Expected audience should have basic familiarity with python, git and the command line.

Keywords:
Conda, Bioconda, snakemake, Bioconductor, reproducible research

Tools:
Conda, Bioconda, snakemake,

Event type:
  • Meetings and conferences
Tools for reproducible research - ISMB/ECCB 2019 https://tess.elixir-europe.org/events/tools-for-reproducible-research-0ed30b48-9c67-48f5-8616-6ca21c78c509 Educators: Bjoern Grüning (RBC), Johannes Köster, Devon Ryan Date: 21.07.2019 Location: ISMB/ECCB Basel Content: The typical data analyst must simultaneously juggle multiple projects, each having its own duration and software requirements. As few analysts have any formal training on structuring or even writing the code necessary to perform an analysis, it is unsurprising that the iterative analytic process can produce a wide assortment of almost identically named files (e.g., “final_results.txt”, “final_results.version2.txt”, “final_results.really_final.txt”), all with unclear origins and produced with a hodge-podge of similarly poorly named scripts. The near impossibility of tracing a results file to the exact process that produced it creates untold difficulties both when it comes time to publish results as well as when planning subsequent experiments months or years later (afterall, which of the “final_results” files was really the “right one”?). These issues are further compounded by software paths and other similar assumptions being hard-coded into scripts, preventing easy analysis replication elsewhere. Performing analyses in a reproducible and traceable manner is clearly needed to combat such problems. Schedule Overview 2:00 - 2:10 pm Installing conda and snakeMake 2:10 - 2:30 pm Intro to conda and bioconda (slides) 2:30 - 3:30 pm Hands-on Session: creating conda envs and installing packages from bioconda repo This practical would require installing hisat, samtools and deeptools via bioconda 3:30 - 4:00 pm Hands-on Session: writing conda recipes Topics in BioVis (including examples) Visualization of sequences, macromolecules, omics data, biological networks 4:00 - 4:15 am Coffee Break 4:15 - 4:35 pm Intro to snakemake Specific tools for visualizing large-scale biological data 4:35 - 6:00 Hands On Session: Writing a snakemake workflow wrapper for mapping, indexing and creating coverage files Learning goals: In this hands-on tutorial, we demonstrate how Conda can be used to deploy specific software versions easily, reproducibly, and without administrator credentials. Moreover, we demonstrate how Conda’s ability to create isolated software environments helps to avoid side-effects between different analyses or different steps of the same analysis. Attendees will also learn how to create conda recipes themselves, so they can contribute new packages to projects such as Bioconda. We further demonstrate how Snakemake can be used in combination with Conda and Containers to create reproducible analysis workflows and execute them on any platform from workstations to clusters and the cloud. Finally, using snakePipes as an example, we demonstrate how Conda and Snakemake can be used to define reproducible and flexible workflows for complex genomics analysis. Prerequisites: - Laptops with Linux or MacOS - Pre-installed Miniconda - install via miniconda : https://conda.io/miniconda.html - Expected audience should have basic familiarity with python, git and the command line. Keywords: Conda, Bioconda, snakemake, Bioconductor, reproducible research Tools: Conda, Bioconda, snakemake, 2019-07-21 09:00:00 UTC 2019-07-21 17:00:00 UTC de.NBI Basel, Basel, Switzerland Basel Basel Basel-Stadt Switzerland [] [] [] meetings_and_conferences [] []