Security warning
Never leave your computer unsupervised with your session open and iPOP-UP server connected.
Warm-up
Where are you on the cluster?
pwd
Then explore the /shared folder
tree -L 1 /shared
/shared/banks folder contains commonly used data and resources. Explore it by yourself with commmands like ls or cd.
Can you see the first 10 lines of the mm10.fa file? (mm10.fa = mouse genomic sequence version 10)
There is a training project accessible to you, navigate to this folder and list what is inside.
Then go to one of your projects and create a folder named 240319_training. This is where you will do all the exercices. If you don’t have a project, you can create a folder named YourName in the training folder and work there.
Optional: use a file explorer
Using the file manager from GNOME, you can navigate easily on iPOP-UP file server.
Open the file manager Fichiers.
Click on Autres emplacements on the side bar.
In the bar Connexion à un serveur, type sftp://ipop-up.rpbs.univ-paris-diderot.fr/ and press the enter key.
Enter your login and password.
This way, you can modify your files directly using any local text editor.
Be careful Never use word processor (like Microsoft Word or LibreOffice Writer) to modify your code and never copy/past code to/from those softwares. Use only text editors and UTF-8 encoding.
Tip For other systems, please see the instructions for Windows, Mac or Linux.
Optional: use JupyterHub interface
In order to make easier the work on the cluster, a Jupyter Hub is implemented. This way, you can access the cluster, modify your files, run your scripts, see your results, etc. in a simple web browser.
Select your project, the resources you need (default resources are sufficient unless you want to run calculations within Jupyter Notebooks or RStudio), and press Start.
The launcher allows you to start a Terminal that can be used for the rest of this course.
Get information about the cluster
sinfo
Slurm sbatch command
sbatch allows you to send an executable file to be ran on a computation node.
Exercise 1: my first sbatch script
Starting from 01_02_flatter.sh, make a script named flatter.sh printing “What a nice training !”
Then run the script:
sbatch flatter.sh
The output that should have appeared on your screen has been diverted to slurm-xxxxx.out but this name can be changed using SBATCH options.
Submit multiple jobs to be executed with identical parameters
Multi-threading
Some tools allow multi-threading, i.e. the use of several CPUs to accelerate one task. It is the case of STAR with the --runThreadN option.
Exercise 6: Alignment, parallel
Modify the previous sbatch file to use 4 threads to align the FASTQ files on the reference. Run and check time and memory usage.
Use Slurm variables
The Slurm controller will set some variables in the environment of the batch script. They can be very useful. For instance, you can improve the previous script using $SLURM_CPUS_PER_TASK.
Of note, Bash shell variables can also be used in the sbatch script:
$USER
$HOME
$HOSTNAME
$PWD
$PATH
Job arrays
Job arrays allow to start the same job a lot of times (same executable, same resources) on different files for example. If you add the following line to your script, the job will be launch 6 times (at the same time), the variable $SLURM_ARRAY_TASK_ID taking the value 0 to 5.
#SBATCH --array=0-5
Exercice 7 : Job array
Starting from 07_08_array_example.sh, make a simple script launching 6 jobs in parallel.
#SBATCH --array=0-7 # if 8 files to proccess FQ=(*fastq.gz)#Create a bash arrayecho${FQ[@]}#Echos array contentsINPUT=$(basename-s .fastq.gz "${FQ[$SLURM_ARRAY_TASK_ID]}")#Each elements of the array are indexed (from 0 to n-1) for slurm echo$INPUT#Echos simplified names of the fastq files
List or find files to process
If for any reason you can’t use bash array, you can alternatively use ls or find to identify the files to process and get the nth with sed (or awk).
#SBATCH --array=1-4 # If 4 files, as sed index start at 1INPUT=$(ls$PATH2/*.fq.gz | sed-n${SLURM_ARRAY_TASK_ID}p)echo$INPUT
Job Array Common Mistakes
The index of bash arrays starts at 0
Don’t forget to have different output files for each task of the array
Same with your log names (\%a or \%J in the name will do the trick)
Do not overload the cluster! Please use \%50 (for example) at the end of your indexes to limit the number of tasks (here to 50) running at the same time. The 51st will start as soon as one finishes!
The RAM defined using #SBATCH --mem=25G is for each task
Complex workflows
Use workflow managers such as Snakemake or Nextflow.
nf-core workflows can be used directly on the cluster.
Exercice 9: nf-core workflows
Starting from 09_nf-core.sh, write a script running ChIP-seq workflow on nf-core test data.