Revision as of 15:18, 17 April 2020

The bwForCluster JUSTUS 2 is a state-wide high-performance compute resource dedicated to Computational Chemistry and Quantum Sciences in Baden-Württemberg, Germany.

Slurm Howto

Preface

This is a collection of howtos and convenient commands that I initially wrote for internal use at Ulm only. Scripts and commands have been tested within our Slurm test environment at JUSTUS (running Slurm 19.05 at the moment).

Maybe you find this collection useful, but use on your own risk. Things may behave differently with different Slurm versions and configurations.

GENERAL

How to find Slurm FAQ?

https://slurm.schedmd.com/faq.html

How to find a Slurm cheat sheet?

https://slurm.schedmd.com/pdfs/summary.pdf

How to get more information?

(Almost) every Slurm command has a man page. Use it.

Online versions: https://slurm.schedmd.com/man_index.html

JOB SUBMISSION

How to submit an interactive job?

Use srun command, e.g.:

$ srun --nodes=1 --ntasks-per-node=8 --pty bash

How to enable X11 forwarding for an interactive job?

Use --x11 flag, e.g.

$ srun --nodes=1 --ntasks-per-node=8 --pty --x11 bash     # run shell with X11 forwarding enabled
$ srun --nodes=1 --ntasks-per-node=8 --pty --x11 xterm    # directly launch terminal window on node

Note:

For X11 forwarding to work, you must also enable X11 forwarding for your ssh login from your local computer to the cluster, i.e.:

local> ssh -X <username>@justus2.uni-ulm.de>

How to submit a batch job?

Use sbatch command:

 $ sbatch <job-script>

How to convert Moab batch job scripts to Slurm?

Replace Moab/Torque job specification flags and environment variables in your job scripts by their corresponding Slurm counterparts.

Commonly used Moab job specification flags and their Slurm equivalents

Option	Moab (msub)	Slurm (sbatch)
Script directive	#MSUB	#SBATCH
Job name	-N <name>	--job-name=<name> (-J <name>)
Account	-A <account>	--account=<account> (-A <account>)
Queue	-q <queue>	--partition=<partition> (-p <partition>)
Wall time limit	-l walltime=<hh:mm:ss>	--time=<hh:mm:ss> (-t <hh:mm:ss>)
Node count	-l nodes=<count>	--nodes=<count> (-N <count>)
Core count	-l procs=<count>	--ntasks=<count> (-n <count>)
Process count per node	-l ppn=<count>	--ntasks-per-node=<count>
Core count per process		--cpus-per-task=<count>
Memory limit per node	-l mem=<limit>	--mem=<limit>
Memory limit per process	-l pmem=<limit>	--mem-per-cpu=<limit>
Job array	-t <array indices>	--array=<indices> (-a <indices>)
Node exclusive job	-l naccesspolicy=singlejob	--exclusive
Initial working directory	-d <directory> (default: $HOME)	--chdir=<directory> (-D <directory>) (default: submission directory)
Standard output file	-o <file path>	--output=<file> (-o <file>)
Standard error file	-e <file path>	--error=<file> (-e <file>)
Combine stdout/stderr to stdout	-j oe	--output=<combined stdout/stderr file>
Mail notification events	-m <event>	--mail-type=<events> (valid types include: NONE, BEGIN, END, FAIL, ALL)
Export environment to job	-V	--export=ALL (default)
Don't export environment to job	(default)	--export=NONE
Export environment variables to job	-v <var[=value][,var2=value2[, ...]]>	--export=<var[=value][,var2=value2[,...]]>

Notes:

Default initial job working directory is $HOME for Moab. For Slurm the default working directory is where you submit your job from.
By default Moab does not export any environment variables to the job's runtime environment. With Slurm most of the login environment variables are exported to your job's runtime environment. This includes environment variables from software modules that were loaded at job submission time (and also $HOSTNAME variable).

Commonly used Moab/Torque script environment variables and their Slurm equivalents

Information	Moab	Torque	Slurm
Job name	$MOAB_JOBNAME	$PBS_JOBNAME	$SLURM_JOB_NAME
Job ID	$MOAB_JOBID	$PBS_JOBID	$SLURM_JOB_ID
Submit directory	$MOAB_SUBMITDIR	$PBS_O_WORKDIR	$SLURM_SUBMIT_DIR
Number of nodes allocated	$MOAB_NODECOUNT	$PBS_NUM_NODES	$SLURM_JOB_NUM_NODES (and: $SLURM_NNODES)
Node list	$MOAB_NODELIST	cat $PBS_NODEFILE	$SLURM_JOB_NODELIST
Number of processes	$MOAB_PROCCOUNT	$PBS_TASKNUM	$SLURM_NTASKS
Requested tasks per node	-	$PBS_NUM_PPN	$SLURM_NTASKS_PER_NODE
Requested CPUs per task	---	---	$SLURM_CPUS_PER_TASK
Job array index	$MOAB_JOBARRAYINDEX	$PBS_ARRAY_INDEX	$SLURM_ARRAY_TASK_ID
Job array range	$MOAB_JOBARRAYRANGE	-	$SLURM_ARRAY_TASK_COUNT
Queue name	$MOAB_CLASS	$PBS_QUEUE	$SLURM_JOB_PARTITION
QOS name	$MOAB_QOS	---	$SLURM_JOB_QOS
---	$PBS_NUM_PPN	$SLURM_TASKS_PER_NODE
Job user	$MOAB_USER	$PBS_O_LOGNAME	$SLURM_JOB_USER
Hostname	$MOAB_MACHINE	$PBS_O_HOST	$SLURMD_NODENAME

Note:

See sbatch man page for a complete list of flags and environment variables.

How to view information about submitted jobs?

Use squeue command, e.g.:

$ squeue                  # all users (admins only)
$ squeue -u <username>    # jobs of specific user
$ squeue -t PENDING       # pending jobs only

Note: The output format of squeue (and most other Slurm commands) is highly configurable to your needs. Look for the --format or --Format options.

How to cancel jobs?

Use scancel command, e.g.

$ scancel <jobid>         # cancel specific job
$ scancel <jobid>_<index> # cancel indexed job in a job array
$ scancel -u <username>   # cancel all jobs of specific user
$ scancel -t PENDING      # cancel pending jobs

How to submit a serial batch job?

Sample job script template for serial job:

#!/bin/bash
# Allocate one node
#SBATCH --nodes=1
# Number of program instances to be executed
#SBATCH --tasks-per-node=1
# 8 GB memory required per node
#SBATCH --mem=8G
# Maximum run time of job
#SBATCH --time=1:00:00
# Give job a reasonable name
#SBATCH --job-name=serial_job
# File name for standard output (%j will be replaced by job id)
#SBATCH --output=serial_job-%j.out
# File name for error output
#SBATCH --error=serial_job-%j.err

# Load software modules as needed, e.g.
# module load foo/bar

# Run serial program
./my_serial_program

Sample code for serial program: "hello_serial.c":https://projects.uni-konstanz.de/attachments/download/16815/hello_serial.c

Notes:

--nodes=1 and --tasks-per-node=1 may be replaced by --ntasks=1.
If not specified, stdout and stderr are both written to slurm-%j.out.

How to emulate Moab output file names?

Use the following directives:

#SBATCH --output="%x.o%j"
#SBATCH --error="%x.e%j"

How to pass command line arguments to the job script?

Run

sbatch <job-script> arg1 arg2 ...

Inside the job script the arguments can be accessed as $1, $2, ...

E.g.:

[...]
infile="$1"
outfile="$2"
./my_serial_program < "$infile" > "$outfile" 2>&1
[...]

Notes:

Do not use $1, $2, ... in "#SBATCH" lines. These parameters can be used only within the regular shell script.

How to request local scratch (SSD/NVMe) at job submission?

Use '--gres=scratch:nnn' option to allocate nnn GB of local (i.e. node-local) scratch space for the entire job.

Example: '--gres=scratch:100' will allocate 100 GB scratch space on a locally attached NVMe device.

Notes:

Do not add any unit (such as --gres=scratch:100G). This will would be treated as requesting an amount of 10^9 * 100GB of scratch space.

Multinode jobs get nnn GB of local scratch space on every node of the job.

Environment variable $SCRATCH will point to
- /scratch/<user>.<jobid> when local scratch has been requested
- /tmp/<user>.<jobid> when no local scratch has not been requested

Environment variable $TMPDIR always points point to /tmp/<user>.<jobid>

For backward compatibility environment variable $RAMDISK always points to /tmp/<user>.<jobid>

Scratch space allocation in /scratch will be enforced by quota limits

Data written to $TMPDIR will always count against allocated memory.

How to submit a multithreaded batch job?

Sample job script template for a job running one multithreaded program instance:

#!/bin/bash
# Allocate one node
#SBATCH --nodes=1
# Number of program instances to be executed
#SBATCH --tasks-per-node=1
# Number of cores per program instance
#SBATCH --cpus-per-task=8
# 8 GB memory required per node
#SBATCH --mem=8G
# Maximum run time of job
#SBATCH --time=1:00:00
# Give job a reasonable name
#SBATCH --job-name=multithreaded_job
# File name for standard output (%j will be replaced by job id)
#SBATCH --output=multithreaded_job-%j.out
# File name for error output
#SBATCH --error=multithreaded_job-%j.err

# Load software modules as needed, e.g.
# module load foo/bar

export OMP_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}
export MKL_NUM_THREADS=${SLURM_JOB_CPUS_PER_NODE}

# Run multithreaded program
./my_multithreaded_program

Sample code for multithreaded program: "hello_openmp.c":https://projects.uni-konstanz.de/attachments/download/16814/hello_openmp.c

Notes:

In our configuration each physical core is considered a "CPU".

Required memory can also by specified per allocated CPU with '--mem-per-cpu' option.

The '--mem' and '--mem-per-cpu' options are mutually exclusive.

In terms of core allocation '--tasks-per-node=1' or '--ntasks=1' together with '--cpus-per-task=8' is almost equivalent to '--tasks-per-node=8' or '--ntasks=8' and omitting '--cpus-per-task=8'. However, there are subtle differences when multiple tasks are spawned within one job by means of srun command.
- See: https://stackoverflow.com/questions/39186698/what-does-the-ntasks-or-n-tasks-does-in-slurm

How to submit an array job?

Use '-a' (or '--array') option, e.g.

 sbatch -a 1-16%8 ...

This will submit 16 tasks to be executed, each one indexed by SLURM_ARRAY_TASK_ID ranging from 1 to 16, but will limit the number of simultaneously running tasks from this job array to 8.

Sample job script template for an array job:

# Number of cores per individual array task
#SBATCH --ntasks=1
#SBATCH --array=1-16%8
#SBATCH --mem=4G
#SBATCH --time=01:00:00
#SBATCH --job-name=array_job
#SBATCH --output=array_job-%A_%a.out
#SBATCH --error=array_job-%A_%a.err

# Load software modules as needed, e.g.
# module load foo/bar

# Print the task id.
echo "My SLURM_ARRAY_TASK_ID: " $SLURM_ARRAY_TASK_ID

# Add lines here to run your computations, e.g.
# ./my_program <input.$SLURM_ARRAY_TASK_ID

Notes:

Placeholder %A will be replaced by the master job id, %a will be replaced by the array task id.

Every sub job in an array job will have its own unique environment variable $SLURM_JOB_ID. Environment variable $SLURM_ARRAY_JOB_ID will be set to the first job array index value for all tasks.

More information: https://slurm.schedmd.com/job_array.html

How to submit an MPI batch job?

Suggested reading: https://slurm.schedmd.com/mpi_guide.html

Sample job script template for an MPI job:

#!/bin/bash
# Allocate two nodes
#SBATCH --nodes=2
# Number of program instances to be executed
#SBATCH --tasks-per-node=8
# Allocate 32 GB memory per node
#SBATCH --mem=32gb
# Maximum run time of job
#SBATCH --time=1:00:00
# Give job a reasonable name
#SBATCH --job-name=mpi_job
# File name for standard output (%j will be replaced by job id)
#SBATCH --output=mpi_job-%j.out
# File name for error output
#SBATCH --error=mpi_job-%j.err

# Add lines here to run your computations, e.g.
#
# Option 1: Lauch MPI tasks by using mpirun
#
# for OpenMPI and GNU compiler:
#
# module load compiler/gnu
# module load mpi/openmpi
# mpirun ./my_mpi_program
#
# for Intel MPI and Intel complier:
#
# module load compiler/intel
# module load mpi/impi
# mpirun ./my_mpi_program
#
# Option 2: Launch MPI tasks by using srun
#
# for OpenMPI and GNU compiler:
#
# module load compiler/gnu
# module load mpi/openmpi
# srun ./my_mpi_program
#
# for Intel MPI and Intel compiler:
#
# module load compiler/intel
# module load mpi/impi
# srun  ./my_mpi_program

Sample code for MPI program: "hello_mpi.c":https://projects.uni-konstanz.de/attachments/download/16813/hello_mpi.c

Notes

SchedMD recommends to use srun and many (most?) sites do so as well. The rationale is that srun is more tightly integrated with the scheduler and provides more consistent and reliable resource tracking and accounting for individual jobs and job steps. mpirun may behave differently for different MPI implementations and versions. There are reports that claim "strange behavior" of mpirun especially when using task affinity and core binding. Using srun is supposed to resolve these issues and is therefore highly recommended.

How to submit a hybrid MPI/OpenMP job?

Sample job script template for an hybrid job:

#!/bin/bash
# Number of nodes to allocate
#SBATCH --nodes=4
# Number of MPI instances (ranks) to be executed per node
#SBATCH --tasks-per-node=2
# Number of threads per MPI instance
#SBATCH --cpus-per-task=4
# Allocate 8 GB memory per node
#SBATCH --mem=8gb
# Maximum run time of job
#SBATCH --time=1:00:00
# Give job a reasonable name
#SBATCH --job-name=hybrid_job
# File name for standard output (%j will be replaced by job id)
#SBATCH --output=hybrid_job-%j.out
# File name for error output
#SBATCH --error=hybrid_job-%j.err

export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
export MKL_NUM_THREADS=${SLURM_CPUS_PER_TASK}

module load compiler/intel
module load mpi/impi
srun ./my_hybrid_program

# or:
# mpirun ./my_hybrid_program

Sample code for hybrid program: "hello_hybrid.c":https://projects.uni-konstanz.de/attachments/download/16812/hello_hybrid.c

Notes:

$SLURM_CPUS_PER_TASK is only set if the '--cpus-per-task' option is specified.

How to request specific node(s) at job submission?

Use '-w' (or '--nodelist') option, e.g.:

$ sbatch -w <node1>,<node2> ...

Also see '-F' (or '--nodefile') option.

How to exclude specific nodes from job?

Use '-x' (or '--exclude') option, e.g.:

sbatch -x <node1>,<node2> ...

@@ Line 452: / Line 452: @@
 Use '-w' (or '--nodelist') option, e.g.:
-<pre>sbatch -w <node1>,<node2> ...</pre>
+<pre>$ sbatch -w <node1>,<node2> ...</pre>
 Also see '-F' (or '--nodefile') option.

BwForCluster JUSTUS 2 Slurm HOWTO: Difference between revisions

Revision as of 15:18, 17 April 2020

Contents

Preface

GENERAL

How to find Slurm FAQ?

How to find a Slurm cheat sheet?

How to get more information?

JOB SUBMISSION

How to submit an interactive job?

How to enable X11 forwarding for an interactive job?

How to submit a batch job?

How to convert Moab batch job scripts to Slurm?

How to view information about submitted jobs?

How to cancel jobs?

How to submit a serial batch job?

How to emulate Moab output file names?

How to pass command line arguments to the job script?

How to request local scratch (SSD/NVMe) at job submission?

How to submit a multithreaded batch job?

How to submit an array job?

How to submit an MPI batch job?

How to submit a hybrid MPI/OpenMP job?

How to request specific node(s) at job submission?

How to exclude specific nodes from job?

Navigation menu

BwForCluster JUSTUS 2 Slurm HOWTO: Difference between revisions

Revision as of 15:18, 17 April 2020

Preface

GENERAL

How to find Slurm FAQ?

How to find a Slurm cheat sheet?

How to get more information?

JOB SUBMISSION

How to submit an interactive job?

How to enable X11 forwarding for an interactive job?

How to submit a batch job?

How to convert Moab batch job scripts to Slurm?

How to view information about submitted jobs?

How to cancel jobs?

How to submit a serial batch job?

How to emulate Moab output file names?

How to pass command line arguments to the job script?

How to request local scratch (SSD/NVMe) at job submission?

How to submit a multithreaded batch job?

How to submit an array job?

How to submit an MPI batch job?

How to submit a hybrid MPI/OpenMP job?

How to request specific node(s) at job submission?

How to exclude specific nodes from job?

Navigation menu

Search