Difference between revisions of "Helix/Slurm"

Revision as of 22:25, 12 July 2022

1 General information about Slurm

The bwForCluster Helix uses Slurm as batch system.

Slurm documentation: https://slurm.schedmd.com/documentation.html
Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf
Slurm tutorials: https://slurm.schedmd.com/tutorials.html

2 Slurm Command Overview

Slurm commands	Brief explanation
sbatch	Submits a job and queues it in an input queue
squeue	Displays information about active, eligible, blocked, and/or recently completed jobs
scontrol	Displays detailed job state information
scancel	Cancels a job

3 Job Submission

Batch jobs are submitted with the command:

$ sbatch <job-script>

A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:

#!/bin/bash
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --time=00:20:00
#SBATCH --mem=1gb
#SBATCH --export=NONE
echo 'Hello world'

This jobs requests one core (--ntasks=1) and 1 GB memory (--mem=1gb) for 20 minutes (--time=00:20:00) on nodes provided by the partition 'single'.

For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script.

4 Partitions

On bwForCluster Helix Production it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested.

The partitions devel and single are operated in shared mode, i.e. jobs from different users can run on the same node. Jobs can get exclusive access to compute nodes in these partitions with the "--exclusive" option. The partitions multi is operated in exclusive mode. Jobs in these partitions automatically get exclusive access to the requested compute nodes.

GPUs are requested with the option "--gres=gpu:<number-of-gpus>".

Partition	Node Access Policy	Node Types	Default	Limits
devel	shared	cpu, gpu	ntasks=1, time=00:10:00, mem-per-cpu=1gb	nodes=1, time=00:30:00
single	shared	cpu, gpu	ntasks=1, time=00:30:00, mem-per-cpu=2gb	nodes=1, time=120:00:00
multi	job exclusive	cpu, gpu	nodes=2, time=00:30:00	nodes=16, time=48:00:00

5 Examples

Here you can find some examples for resource requests in batch jobs.

5.1 Serial Programs

#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --time=120:00:00
#SBATCH --mem=4gb

Notes:

Jobs with "--mem" below 240gb can run on all node types associated with the single partition.

5.2 Multi-threaded Programs

#SBATCH --partition=single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=01:30:00
#SBATCH --mem=50gb

Notes:

Jobs with "--ntasks-per-node" up to 64 and "--mem" below 240gb can run on all node types associated with the single partition.

5.3 MPI Programs

#SBATCH --partition=multi
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=64
#SBATCH --time=12:00:00
#SBATCH --mem=50gb

Notes:

"--mem" requests the memory per node. Jobs with "--mem" below 240gb can run on all node types associated with the multi partition.

@@ Line 106: / Line 106: @@
 '''Notes:'''
 * Jobs with "--ntasks-per-node" up to 64 and "--mem" below 240gb can run on all node types associated with the single partition.
+=== MPI Programs ===
+<source lang="bash">
+#SBATCH --partition=multi
+#SBATCH --nodes=2
+#SBATCH --ntasks-per-node=64
+#SBATCH --time=12:00:00
+#SBATCH --mem=50gb
+</source>
+'''Notes:'''
+* "--mem" requests the memory per node. Jobs with "--mem" below 240gb can run on all node types associated with the multi partition.

Difference between revisions of "Helix/Slurm"

Revision as of 22:25, 12 July 2022

Contents

1 General information about Slurm

2 Slurm Command Overview

3 Job Submission

4 Partitions

5 Examples

5.1 Serial Programs

5.2 Multi-threaded Programs

5.3 MPI Programs

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

bwHPC Wiki

bwHPC Systems

Documentation

Support

Data Storage

Tools