Snakemake working group

Snakemake working group

2024-10-15

Nadia GOUE, Sébastien RAVEL, Chloé QUIGNOT

Source: adapted from FAIRbioinfo 2021 training material of the IFB and Snakemake introduction tutorial from BIOI2

Material under CC-BY-SA licence
CC-BY-SA

How this working group is organised

“Today’s timetable”
  • Divided into 2 sessions of 1h30 each
  • A single continuous but progressive exercise over these 2 sessions
  • Presentation of a use-case by Pauline @2pm (~30min)

What the exercise is about

  • You’ll be building a Snakemake pipeline from scratch
  • The exercise will be divided into several parts and you’ll be guided along the way
    • Part A: create a first simple pipeline with FastQC & MultiQC tools
      • Inputs: NGS data (*.fastq.gz)
      • Processing: individual QC analyses with FastQC and aggregation of results with MultiQC
      • Output: Quality Control report
    • Part B: improve the previous pipeline
    • Part C: adapt the previous pipeline to the HPC environment

Don’t worry about understanding what QC analysis does

Snakemake’s FAQ: https://snakemake.readthedocs.io/en/latest/project_info/faq.html

It’s time to get your hands dirty!

Go to the Moodle web page and open up instructions for exercise A.

https://moodle.france-bioinformatique.fr/course/view.php?id=29

A few reminders

  • Snakemake workflow = set of rules
  • Snakefile = where snakemake code is written
  • Rules are defined by their name and contain directives (of which input and output to specify input & output files):
rule myRuleName
    input: "myInputFile"
    output: "myOutputFile"
    shell: "echo {input} > {output}"
  • Snakemake only executes the target rule and only rules that will help in generating its files
  • Rules can be generalised using wildcards written within braces (e.g. {wildcardname})
  • A Snakefile is run with the snakemake --cores 1 command (+ other options available)
  • Debugging options: --dag, --rulegraph, --filegraph and --dry-run + -p to print shell commands