Difference between revisions of "Helix/Slurm"

From bwHPC Wiki
Jump to: navigation, search
Line 1: Line 1:
== General information about Slurm ==
+
= General information about Slurm =
 
The bwForCluster Helix uses Slurm as batch system.
 
The bwForCluster Helix uses Slurm as batch system.
 
* Slurm documentation: https://slurm.schedmd.com/documentation.html
 
* Slurm documentation: https://slurm.schedmd.com/documentation.html
Line 5: Line 5:
 
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html
 
* Slurm tutorials: https://slurm.schedmd.com/tutorials.html
   
== Slurm Command Overview ==
+
= Slurm Command Overview =
   
 
{| width=750px class="wikitable"
 
{| width=750px class="wikitable"
Line 20: Line 20:
 
|}
 
|}
   
== Job Submission ==
+
= Job Submission =
   
 
Batch jobs are submitted with the command:
 
Batch jobs are submitted with the command:
Line 42: Line 42:
 
For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script.
 
For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script.
   
== Partitions ==
+
=== Partitions ===
   
 
On bwForCluster Helix Production it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested.
 
On bwForCluster Helix Production it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested.
Line 149: Line 149:
   
 
Once the walltime limit has been reached you will be automatically logged out from the compute node.
 
Once the walltime limit has been reached you will be automatically logged out from the compute node.
  +
  +
= Job Monitoring =
  +
  +
== Information about submitted jobs ==
  +
  +
For an overview of your submitted jobs use the command:
  +
  +
<source lang=bash>$ squeue</source>
  +
  +
To get detailed information about a specific jobs use the command:
  +
  +
<source lang=bash>$ scontrol show job <jobid></source>
  +
  +
== Interactive access to running jobs ==
  +
If you like to see what happens on the compute node(s), you can access the allocated resources of a running job with:
  +
  +
<source lang=bash>$ srun --jobid=[jobid] --pty /bin/bash</source>
  +
  +
Commands like 'top' show you the most busy processes on the node. To exit 'top' type 'q'.
  +
  +
To monitor your GPU processes use the command 'nvidia-smi'.
  +
  +
In the case of multi node jobs, lookup the node names of your job with squeue and add one of the node names with the --nodelist option to the srun command:
  +
  +
<source lang=bash>$ srun --jobid=[jobid] --nodelist=[node-name] --pty /bin/bash</source>

Revision as of 22:31, 12 July 2022

1 General information about Slurm

The bwForCluster Helix uses Slurm as batch system.

2 Slurm Command Overview

Slurm commands Brief explanation
sbatch Submits a job and queues it in an input queue
squeue Displays information about active, eligible, blocked, and/or recently completed jobs
scontrol Displays detailed job state information
scancel Cancels a job

3 Job Submission

Batch jobs are submitted with the command:

$ sbatch <job-script>

A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:

#!/bin/bash
#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --time=00:20:00
#SBATCH --mem=1gb
#SBATCH --export=NONE
echo 'Hello world'

This jobs requests one core (--ntasks=1) and 1 GB memory (--mem=1gb) for 20 minutes (--time=00:20:00) on nodes provided by the partition 'single'.

For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script.

3.1 Partitions

On bwForCluster Helix Production it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested.

The partitions devel and single are operated in shared mode, i.e. jobs from different users can run on the same node. Jobs can get exclusive access to compute nodes in these partitions with the "--exclusive" option. The partitions multi is operated in exclusive mode. Jobs in these partitions automatically get exclusive access to the requested compute nodes.

GPUs are requested with the option "--gres=gpu:<number-of-gpus>".

Partition Node Access Policy Node Types Default Limits
devel shared cpu, gpu ntasks=1, time=00:10:00, mem-per-cpu=1gb nodes=1, time=00:30:00
single shared cpu, gpu ntasks=1, time=00:30:00, mem-per-cpu=2gb nodes=1, time=120:00:00
multi job exclusive cpu, gpu nodes=2, time=00:30:00 nodes=16, time=48:00:00


3.2 Examples

Here you can find some examples for resource requests in batch jobs.

3.2.1 Serial Programs

#SBATCH --partition=single
#SBATCH --ntasks=1
#SBATCH --time=120:00:00
#SBATCH --mem=4gb

Notes:

  • Jobs with "--mem" below 240gb can run on all node types associated with the single partition.

3.2.2 Multi-threaded Programs

#SBATCH --partition=single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=16
#SBATCH --time=01:30:00
#SBATCH --mem=50gb

Notes:

  • Jobs with "--ntasks-per-node" up to 64 and "--mem" below 240gb can run on all node types associated with the single partition.

3.2.3 MPI Programs

#SBATCH --partition=multi
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=64
#SBATCH --time=12:00:00
#SBATCH --mem=50gb

Notes:

  • "--mem" requests the memory per node. Jobs with "--mem" below 240gb can run on all node types associated with the multi partition.

3.2.4 GPU Programs

#SBATCH --partition=single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00
#SBATCH --mem=200gb

Notes:

  • The number of GPUs per node is requested with the option "--gres=gpu:<number-of-gpus>"
  • It is possible to request a certain GPU type with the option "--gres=gpu:<gpu-type>:<number-of-gpus>". For <gpu-type> put the 'GPU Type' listed in the last line of the GPU Nodes Hardware table.


4 Interactive Jobs

Interactive jobs must NOT run on the logins nodes, however resources for interactive jobs can be requested using srun. The following example requests an interactive session on 1 core for 2 hours:

$ salloc --partition=single --ntasks=1 --time=2:00:00

After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node.

If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with:

$ salloc --partition=single --ntasks=1 --time=2:00:00 --x11

Once the walltime limit has been reached you will be automatically logged out from the compute node.

5 Job Monitoring

5.1 Information about submitted jobs

For an overview of your submitted jobs use the command:

$ squeue

To get detailed information about a specific jobs use the command:

$ scontrol show job <jobid>

5.2 Interactive access to running jobs

If you like to see what happens on the compute node(s), you can access the allocated resources of a running job with:

$ srun --jobid=[jobid] --pty /bin/bash

Commands like 'top' show you the most busy processes on the node. To exit 'top' type 'q'.

To monitor your GPU processes use the command 'nvidia-smi'.

In the case of multi node jobs, lookup the node names of your job with squeue and add one of the node names with the --nodelist option to the srun command:

$ srun --jobid=[jobid] --nodelist=[node-name] --pty /bin/bash