The presentation, exercices and corrections of the 2nd session (June 2023) are available here.
The presentation, exercices and corrections of the 3rd session (March 2024) are available here.
You are now connected to a computation node (here named cpu-node130), with 6 GB of RAM and 10 CPUs just for you! You can even choose your node using --nodelist cpu-node131. This can be useful to check is one of your jobs is properly running.
[username @ ipop-up 17:05]$ ~ : srun -p ipop-up --nodelist cpu-node131 --pty bash
[username @ cpu-node131 17:05]$ ~ : top
top - 17:05:08 up 73 days, 6:26, 0 users, load average: 0.00, 0.01, 0.05
Tasks: 1411 total, 1 running, 1410 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 26361312+total, 24462500+free, 10587068 used, 8401040 buff/cache
KiB Swap: 4194300 total, 4039428 free, 154872 used. 25210102+avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
115414 hennion 20 0 174156 3804 1628 R 0.7 0.0 0:00.30 top
...
Press Ctrl+D or type exit to go back to the login node.
You can check the jobs that are running using squeue.
Only your jobs:
[username@ipop-up]$ squeue --me
Only the jobs of username:
[username@ipop-up]$ squeue -u username
All ipop-up jobs:
[username@ipop-up]$ squeue -p ipop-up
Cancelling a job
If you want to cancel a job: scancel job_number
[username@ipop-up]$ scancel 8016984
Information about past jobs
sacct
The sacct command gives you information about past and running jobs. The documentation is here. You can get different information with the --format option. For instance:
Here you have the job ID and name, its starting time, its running time, the maximum RAM used, the memory you requested (it has to be higher than MaxRSS, otherwise the job fails, but not much higher to allow the others to use the resource), and job status (failed, completed, running).
Tip Use an alias instead of typing this super long command! Instructions here!
Add -S MMDD to have older jobs (default is today only).
The seff command gives you information about past jobs. It sumarizes the cores and memory you asked for, and the real usage. It is very useful to ensure that the resource you book corresponds to the real need.
[hennion@ipop-up]$ bi4edc : seff 239790
Job ID: 239790
Cluster: production
User/Group: hennion/umr7216
State: COMPLETED (exit code 0)
Nodes: 1
Cores per node: 4
CPU Utilized: 00:12:50
CPU Efficiency: 85.18% of 00:15:04 core-walltime
Job Wall-clock time: 00:03:46
Memory Utilized: 25.24 GB
Memory Efficiency: 84.12% of 30.00 GB