Introduction to Snakemake

You should get an output on your screen similar to the following:

Building DAG of jobs...
Job stats:
job            count
-----------  -------
all                1
fusionFasta        1
loadData           2
mafft              1
total              5

Execute 2 jobs...

[Mon Aug 19 08:32:28 2024]
localrule loadData:
    output: results/data/P01308.fasta
    jobid: 4
    reason: Missing output files: results/data/P01308.fasta
    wildcards: sample=P01308
    resources: tmpdir=<TBD>


[Mon Aug 19 08:32:28 2024]
localrule loadData:
    output: results/data/P10415.fasta
    jobid: 3
    reason: Missing output files: results/data/P10415.fasta
    wildcards: sample=P10415
    resources: tmpdir=<TBD>

Execute 1 jobs...

[Mon Aug 19 08:32:28 2024]
localrule fusionFasta:
    input: results/data/P10415.fasta, results/data/P01308.fasta
    output: results/alignment/P10415_P01308.fasta
    jobid: 2
    reason: Missing output files: results/alignment/P10415_P01308.fasta; Input files updated by another job: results/data/P10415.fasta, results/data/P01308.fasta
    wildcards: uniprotid1=P10415, uniprotid2=P01308
    resources: tmpdir=<TBD>

Execute 1 jobs...

[Mon Aug 19 08:32:28 2024]
localrule mafft:
    input: results/alignment/P10415_P01308.fasta
    output: results/alignment/P10415_P01308_aligned.fasta
    jobid: 1
    reason: Missing output files: results/alignment/P10415_P01308_aligned.fasta; Input files updated by another job: results/alignment/P10415_P01308.fasta
    wildcards: prefix=P10415_P01308
    resources: tmpdir=<TBD>

Execute 1 jobs...

[Mon Aug 19 08:32:28 2024]
localrule all:
    input: results/alignment/P10415_P01308_aligned.fasta
    jobid: 0
    reason: Input files updated by another job: results/alignment/P10415_P01308_aligned.fasta
    resources: tmpdir=<TBD>

Job stats:
job            count
-----------  -------
all                1
fusionFasta        1
loadData           2
mafft              1
total              5

Reasons:
    (check individual jobs above for details)
    input files updated by another job:
        all, fusionFasta, mafft
    output files have to be generated:
        fusionFasta, loadData, mafft

This was a dry-run (flag -n). The order of jobs does not reflect the order of execution.

The Job stats section (lines 2-9, and again lines 61-68) tells you that 5 jobs will be run in total and the Reasons section (lines 70-75) will explain why. In this case, output files are missing, and depending on which ones are missing some or all of the rules will have to be rerun.

In the log, there is also 1 section dedicated to each job that would have been run by Snakemake (e.g. loadData for P01308 lines 13-19). Each section shows the input and output file paths, the reason why it’s run and information specific to that job i.e. the resources used, its jobid, the wildcard values…

Introduction to Snakemake - Exercises

A. Setup

1. connect to the IFB cluster

2. prepare your working environment

B. Understanding the pipeline

C. Running the pipeline

D. Checking the output

E. Conclusion