s |
focus search bar ( enter to select, ▲ / ▼ to change selection) |
g c |
go to cluster |
g e |
go to edctools |
g f |
go to facility |
g g |
go to guidelines |
g t |
go to training |
h |
toggle this help ( esc also exits) |
Date: 19/03/2024
Trainers: Olivier Kirsh, Julien Rey, Magali Hennion, Elouan Bethuel
The slides of the presentation can be downloaded here.
ssh -o PubkeyAuthentication=no username@ipop-up.rpbs.univ-paris-diderot.fr
Where are you on the cluster?
pwd
Then explore the /shared folder
tree -L 1 /shared
/shared/banks folder contains commonly used data and resources. Explore it by yourself with commmands like ls or cd.
Can you see the first 10 lines of the mm10.fa file? (mm10.fa = mouse genomic sequence version 10)
There is a training project accessible to you, navigate to this folder and list what is inside.
Then go to one of your projects and create a folder named 240319_training. This is where you will do all the exercices. If you don’t have a project, you can create a folder named YourName in the training folder and work there.
Using the file manager from GNOME, you can navigate easily on iPOP-UP file server.
Fichiers.Autres emplacements on the side bar.Connexion à un serveur, type sftp://ipop-up.rpbs.univ-paris-diderot.fr/ and press the enter key.This way, you can modify your files directly using any local text editor.
In order to make easier the work on the cluster, a Jupyter Hub is implemented. This way, you can access the cluster, modify your files, run your scripts, see your results, etc. in a simple web browser.
Start.The launcher allows you to start a Terminal that can be used for the rest of this course.
sinfo
sbatch allows you to send an executable file to be ran on a computation node.
Starting from 01_02_flatter.sh, make a script named flatter.sh printing “What a nice training !”
Then run the script:
sbatch flatter.sh
The output that should have appeared on your screen has been diverted to slurm-xxxxx.out but this name can be changed using SBATCH options.

Modify flatter.sh to add this line:
#SBATCH -o flatter.out
then run it. Anything different ?
Run using sbatch the command hostname in a way that the sbatch outfile is called hostname.out.
What is the output ? How does it differ from typing directly hostname in the terminal and why ?
| Options | Flag | Function |
|---|---|---|
| −−partition | -p | partition to run the job (mandatory) |
| −−job-name | -J | give a job a name |
| −−output | -o | output file name |
| −−error | -e | error file name |
| −−chdir | -D | set the working directory before running |
| −−time | -t | limit the total run time (default : no limit) |
| −−mem | memory that your job will have access to (per node) |
To find out more, the Slurm manual man sbatch or https://slurm.schedmd.com/sbatch.html.
A lot of tools are installed on the cluster. To list them, use one of the following commands.
module available
module avail
module av
You can limit the search for a specific tool, for example look for the different versions of multiqc on the cluster using module av multiqc.

module load tool/1.3
module load tool1 tool2 tool3
module list
module purge
The sleep command : do nothing (delay) for the set number of seconds.
Restart from 03_04_hostname_sleep.sh and launch a simple job that will launch sleep 600.
On your terminal, type
squeue

ST Status of the job.
R = Running
PD = Pending
To see only iPOP-UP jobs
squeue -p ipop-up
To see only the jobs of untel
squeue -u untel
To see only your jobs
squeue --me
To cancel a job which you started, use the scancel command followed by the jobID (Number given by SLURM, visible in squeue)
scancel jobID
You can stop the previous sleep job with this command.
Re-run sleep.sh and type
sacct

You can pass the option --format to list the information that you want to display, including memory usage, time of running,…
For instance
sacct --format=JobID,JobName,Start,Elapsed,CPUTime,NCPUS,NodeList,MaxRSS,ReqMeM,State
To see every options, run sacct --helpformat
After the run, the seff command allows you to access information about the efficiency of a job.
seff <jobid>

Run an alignment using STAR version 2.7.5a starting from 05_06_star.sh.
/shared/projects/training/test_fastq.Check the resource that was used using seff.
| Options | Default | Function |
|---|---|---|
| −−nodes | 1 | Number of nodes required (or min-max) |
| −−nodelist | Select one or several nodes | |
| −−ntasks-per-node | 1 | Number of tasks invoked on each node |
| −−mem | 2GB | Memory required per node |
| −−cpus-per-task | 1 | Number of CPUs allocated to each task |
| −−mem-per-cpu | 2GB | Memory required per allocated CPU |
| −−array | Submit multiple jobs to be executed with identical parameters |
Some tools allow multi-threading, i.e. the use of several CPUs to accelerate one task. It is the case of STAR with the --runThreadN option.
Modify the previous sbatch file to use 4 threads to align the FASTQ files on the reference. Run and check time and memory usage.
The Slurm controller will set some variables in the environment of the batch script. They can be very useful. For instance, you can improve the previous script using $SLURM_CPUS_PER_TASK.
The full list of variables is visible here.
Some useful ones:
Of note, Bash shell variables can also be used in the sbatch script:
Job arrays allow to start the same job a lot of times (same executable, same resources) on different files for example. If you add the following line to your script, the job will be launch 6 times (at the same time), the variable $SLURM_ARRAY_TASK_ID taking the value 0 to 5.
#SBATCH --array=0-5
Starting from 07_08_array_example.sh, make a simple script launching 6 jobs in parallel.
It is possible to limit the number of jobs running at the same time using %max_running_jobs in #SBATCH --array option.
Modify your script to run only 2 jobs at the time.
You will see using squeue command that some of the tasks are pending until the others are over.

Example
#SBATCH --array=0-7 # if 8 files to proccess
FQ=(*fastq.gz) #Create a bash array
echo ${FQ[@]} #Echos array contents
INPUT=$(basename -s .fastq.gz "${FQ[$SLURM_ARRAY_TASK_ID]}") #Each elements of the array are indexed (from 0 to n-1) for slurm
echo $INPUT #Echos simplified names of the fastq files
If for any reason you can’t use bash array, you can alternatively use ls or find to identify the files to process and get the nth with sed (or awk).
#SBATCH --array=1-4 # If 4 files, as sed index start at 1
INPUT=$(ls $PATH2/*.fq.gz | sed -n ${SLURM_ARRAY_TASK_ID}p)
echo $INPUT
#SBATCH --mem=25G is for each task
Use workflow managers such as Snakemake or Nextflow.
nf-core workflows can be used directly on the cluster.
Starting from 09_nf-core.sh, write a script running ChIP-seq workflow on nf-core test data.
Some help can be found here. Please also see the full documentation.
Correction
[Of note, even with the test dataset, it takes a lot of time and resources!]
To find out more, read the SLURM manual : man sbatch or https://slurm.schedmd.com/sbatch.html
Ask for help or signal problems on the cluster : https://discourse.rpbs.univ-paris-diderot.fr/
iPOP-UP cluster documentation: https://ipop-up.docs.rpbs.univ-paris-diderot.fr/documentation/
BiBs practical guide: https://parisepigenetics.github.io/bibs/cluster/ipopup
IFB community support : https://community.france-bioinformatique.fr/




|
BiBs
2025 parisepigenetics
https://github.com/parisepigenetics/bibs |
| programming pages theme v0.5.22 (https://github.com/pixeldroid/programming-pages) |
s |
focus search bar ( enter to select, ▲ / ▼ to change selection) |
g c |
go to cluster |
g e |
go to edctools |
g f |
go to facility |
g g |
go to guidelines |
g t |
go to training |
h |
toggle this help ( esc also exits) |