BwForCluster JUSTUS 2 Slurm HOWTO
The bwForCluster JUSTUS 2 is a state-wide high-performance compute resource dedicated to Computational Chemistry and Quantum Sciences in Baden-Württemberg, Germany.
Slurm Howto
Preface
This is a collection of howtos and convenient commands that I initially wrote for internal use at Ulm only. Scripts and commands have been tested within our Slurm test environment at JUSTUS (running Slurm 19.05 at the moment).
Maybe you find this collection useful, but use on your own risk. Things may behave differently with different Slurm versions and configurations.
GENERAL
How to find Slurm FAQ?
https://slurm.schedmd.com/faq.html
How to find a Slurm cheat sheet?
https://slurm.schedmd.com/pdfs/summary.pdf
How to get more information?
(Almost) every Slurm command has a man page. Use it.
Online versions: https://slurm.schedmd.com/man_index.html
JOB SUBMISSION
How to submit an interactive job?
Use srun command, e.g.:
$ srun --nodes=1 --ntasks-per-node=8 --pty bash
How to enable X11 forwarding for an interactive job?
Use --x11 flag, e.g.
$ srun --nodes=1 --ntasks-per-node=8 --pty --x11 bash # run shell with X11 forwarding enabled $ srun --nodes=1 --ntasks-per-node=8 --pty --x11 xterm # directly launch terminal window on node
Note:
- For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:
local> ssh -X <username>@justus2.uni-ulm.de>
How to submit a batch job?
Use sbatch command:
$ sbatch <job-script>
How to convert Moab batch job scripts to Slurm?
Replace Moab/Torque job specification flags and environment variables in your job scripts by their corresponding Slurm counterparts.
Commonly used Moab job specification flags and their Slurm equivalents
Option | Moab (msub) | Slurm (sbatch) |
---|---|---|
Script directive | #MSUB | #SBATCH |
Job name | -N <name> | --job-name=<name> (-J <name>) |
Account | -A <account> | --account=<account> (-A <account>) |
Queue | -q <queue> | --partition=<partition> (-p <partition>) |
Wall time limit | -l walltime=<hh:mm:ss> | --time=<hh:mm:ss> (-t <hh:mm:ss>) |
Node count | -l nodes=<count> | --nodes=<count> (-N <count>) |
Core count | -l procs=<count> | --ntasks=<count> (-n <count>) |
Process count per node | -l ppn=<count> | --ntasks-per-node=<count> |
Core count per process | --cpus-per-task=<count> | |
Memory limit per node | -l mem=<limit> | --mem=<limit> |
Memory limit per process | -l pmem=<limit> | --mem-per-cpu=<limit> |
Job array | -t <array indices> | --array=<indices> (-a <indices>) |
Node exclusive job | -l naccesspolicy=singlejob | --exclusive |
Initial working directory | -d <directory> (default: $HOME) | --chdir=<directory> (-D <directory>) (default: submission directory) |
Standard output file | -o <file path> | --output=<file> (-o <file>) |
Standard error file | -e <file path> | --error=<file> (-e <file>) |
Combine stdout/stderr to stdout | -j oe | --output=<combined stdout/stderr file> |
Mail notification events | -m <event> | --mail-type=<events> (valid types include: NONE, BEGIN, END, FAIL, ALL) |
Export environment to job | -V | --export=ALL (default) |
Don't export environment to job | (default) | --export=NONE |
Export environment variables to job | -v <var[=value][,var2=value2[, ...]]> | --export=<var[=value][,var2=value2[,...]]> |
Notes:
- Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.
- By default Moab does not export any environment variables to the job's runtime environment. With Slurm most of the login environment variables are exported to your job's runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).
Commonly used Moab/Torque script environment variables and their Slurm equivalents
Information | Moab | Torque | Slurm |
---|---|---|---|
Job name | $MOAB_JOBNAME | $PBS_JOBNAME | $SLURM_JOB_NAME |
Job ID | $MOAB_JOBID | $PBS_JOBID | $SLURM_JOB_ID |
Submit directory | $MOAB_SUBMITDIR | $PBS_O_WORKDIR | $SLURM_SUBMIT_DIR |
Number of nodes allocated | $MOAB_NODECOUNT | $PBS_NUM_NODES | $SLURM_JOB_NUM_NODES (and: $SLURM_NNODES) |
Node list | $MOAB_NODELIST | cat $PBS_NODEFILE | $SLURM_JOB_NODELIST |
Number of processes | $MOAB_PROCCOUNT | $PBS_TASKNUM | $SLURM_NTASKS |
Requested tasks per node | - | $PBS_NUM_PPN | $SLURM_NTASKS_PER_NODE |
Requested CPUs per task | --- | --- | $SLURM_CPUS_PER_TASK |
Job array index | $MOAB_JOBARRAYINDEX | $PBS_ARRAY_INDEX | $SLURM_ARRAY_TASK_ID |
Job array range | $MOAB_JOBARRAYRANGE | - | $SLURM_ARRAY_TASK_COUNT |
Queue name | $MOAB_CLASS | $PBS_QUEUE | $SLURM_JOB_PARTITION |
QOS name | $MOAB_QOS | --- | $SLURM_JOB_QOS |
--- | $PBS_NUM_PPN | $SLURM_TASKS_PER_NODE | |
Job user | $MOAB_USER | $PBS_O_LOGNAME | $SLURM_JOB_USER |
Hostname | $MOAB_MACHINE | $PBS_O_HOST | $SLURMD_NODENAME |
Note:
- See sbatch man page for a complete list of flags and environment variables.
How to view information about submitted jobs?
Use squeue command, e.g.:
squeue # all users (admins only) squeue -u <username> # jobs of specific user squeue -t PENDING # pending jobs only
Note: The output format of squeue (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.
How to cancel jobs?
Use scancel command, e.g.
scancel <jobid> # cancel specific job scancel <jobid>_<index> # cancel indexed job in a job array scancel -u <username> # cancel all jobs of specific user scancel -t PENDING # cancel pending jobs
How to submit a serial batch job?
Sample job script template for serial job:
#!/bin/bash # Allocate one node #SBATCH --nodes=1 # Number of program instances to be executed #SBATCH --tasks-per-node=1 # 8 GB memory required per node #SBATCH --mem=8G # Maximum run time of job #SBATCH --time=1:00:00 # Give job a reasonable name #SBATCH --job-name=serial_job # File name for standard output (%j will be replaced by job id) #SBATCH --output=serial_job-%j.out # File name for error output #SBATCH --error=serial_job-%j.err # Load software modules as needed, e.g. # module load foo/bar # Run serial program ./my_serial_program