Snakemake working group
2024-10-15
Nadia GOUE, Sébastien RAVEL, Chloé QUIGNOT
Source: adapted from FAIRbioinfo 2021 training material of the IFB
and Snakemake introduction tutorial from BIOI2
Material under CC-BY-SA
licence
How this working group is organised
- Divided into 2 sessions of 1h30 each
- A single continuous but progressive exercise over these 2
sessions
- Presentation of a use-case by Pauline @2pm (~30min)
What the exercise is about
- You’ll be building a Snakemake pipeline from scratch
- The exercise will be divided into several parts and you’ll be guided
along the way
- Part A: create a first simple pipeline with FastQC & MultiQC
tools
- Inputs: NGS data (
*.fastq.gz
)
- Processing: individual QC analyses with FastQC and aggregation of
results with MultiQC
- Output: Quality Control report
- Part B: improve the previous pipeline
- Part C: adapt the previous pipeline to the HPC environment
Don’t worry about understanding what QC analysis does
Snakemake’s FAQ: https://snakemake.readthedocs.io/en/latest/project_info/faq.html
A few reminders
- Snakemake workflow = set of rules
- Snakefile = where snakemake code is written
- Rules are defined by their name and contain directives (of which
input
and output
to specify input & output
files):
rule myRuleName
input: "myInputFile"
output: "myOutputFile"
shell: "echo {input} > {output}"
- Snakemake only executes the target rule and only rules that will
help in generating its files
- Rules can be generalised using
wildcards
written within
braces (e.g. {wildcardname}
)
- A Snakefile is run with the
snakemake --cores 1
command
(+ other options available)
- Debugging options:
--dag
, --rulegraph
,
--filegraph
and --dry-run
+ -p
to
print shell commands