BwForCluster MLS&WISO Production Slurm
- 1 General information about Slurm
- 2 Slurm Command Overview
- 3 Job Submission
- 4 Interactive Jobs
- 5 Job Monitoring
- 6 Job Feedback
- 7 Overview about free resources
1 General information about Slurm
- Slurm documentation: https://slurm.schedmd.com/documentation.html
- Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf
- Slurm tutorials: https://slurm.schedmd.com/tutorials.html
2 Slurm Command Overview
|Slurm commands||Brief explanation|
|sbatch||Submits a job and queues it in an input queue|
|squeue||Displays information about active, eligible, blocked, and/or recently completed jobs|
|scontrol||Displays detailed job state information|
|scancel||Cancels a job|
3 Job Submission
Batch jobs are submitted with the command:
$ sbatch <job-script>
A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:
#!/bin/bash #SBATCH --partition=single #SBATCH --ntasks=1 #SBATCH --time=00:20:00 #SBATCH --mem=1gb #SBATCH --export=NONE echo 'Hello world'
This jobs requests one core (--ntasks=1) and 1 GB memory (--mem=1gb) for 20 minutes (--time=00:20:00) on nodes provided by the partition 'single'.
For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script.
If you want to convert Moab batch scripts to Slurm, you can find general information on this page. Specific information for the usage of Slurm on bwForCluster MLS&WISO Production is included in the following chapters.
On bwForCluster MLS&WISO Production it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested.
The partitions devel, single, and gpu-single are operated in shared mode, i.e. jobs from different users can run on the same node. Jobs can get exclusive access to compute nodes in these partitions with the "--exclusive" option. The partitions multi and gpu-multi are operated in exclusive mode. Jobs in these partitions automatically get exclusive access to the requested compute nodes.
When choosing the partition gpu-single or gpu-multi, the number of GPUs must be requested with the option "--gres=gpu:<number-of-gpus>".
|Partition||Node Access Policy||Node Types||Default||Limits|
|devel||shared||standard||ntasks=1, time=00:10:00, mem-per-cpu=1gb||nodes=1, time=00:30:00|
|single||shared||standard, best, best-sky, best-cas, fat, fat-ivy||ntasks=1, time=00:30:00, mem-per-cpu=1gb||nodes=1, time=120:00:00|
|multi||job exclusive||standard, best, best-sky, best-cas||nodes=2, time=00:30:00||nodes=128, time=48:00:00|
|gpu-single||shared||gpu, gpu-sky, gpu-cas||ntasks=1, time=00:30:00, mem-per-gpu=24gb||nodes=1, time=48:00:00|
|gpu-multi||job exclusive||gpu||nodes=2, time=00:30:00||nodes=18, time=48:00:00|
If a job requires a certain Intel architecture, the architecture of compute nodes can be explicitly requested with "--constraint=<constraint_name>".
|ivy||request nodes with architecture Ivy Bridge|
|has||request nodes with architecture Haswell|
|sky||request nodes with architecture Sky Lake|
|cas||request nodes with architecture Cascade Lake|
Here you can find some examples for resource requests in batch jobs. Most partitions allow the allocation of various node types. If you need a certain node types, adapt the memory request, the number of cores per nodes, and/or use the "--constraint" option.
3.3.1 Serial Programs
#SBATCH --partition=single #SBATCH --ntasks=1 #SBATCH --time=120:00:00 #SBATCH --mem=4gb
- Jobs with "--mem" below 60gb can run on all node types associated with the single partition.
- If you increase the memory request, less node types may be available for the job.
- If you need best nodes only, request for example more memory "--mem=70gb" and use "--constraint=has" to exclude standard as well as best-sky, best-cas, and fat-ivy nodes.
3.3.2 Multi-threaded Programs
#SBATCH --partition=single #SBATCH --nodes=1 #SBATCH --ntasks-per-node=16 #SBATCH --time=01:30:00 #SBATCH --mem=50gb
- Jobs with "--ntasks-per-node" up to 16 and "--mem" below 60gb can run on all node types associated with the single partition.
- If you increase "--ntasks-per-node" or "--mem", less node types may be available for the job, see 'Number of Cores' and 'Working Memory' in the CPU Nodes Hardware table.
3.3.3 MPI Programs
#SBATCH --partition=multi #SBATCH --nodes=2 #SBATCH --ntasks-per-node=16 #SBATCH --time=12:00:00 #SBATCH --mem=50gb
- "--mem" requests the memory per node.
- Jobs with "--nodes=" up to 2, --ntasks-per-node" up to 16, and "--mem" below 60gb can run on all node types associated with the multi partition.
- If you increase "--nodes", "--ntasks-per-node", or "--mem", less node types may be available for the job, see 'Number of Cores' and 'Working Memory' in the CPU Nodes Hardware table.
3.3.4 GPU Programs
#SBATCH --partition=gpu-single #SBATCH --nodes=1 #SBATCH --ntasks-per-node=40 #SBATCH --cpus-per-gpu=10 #SBATCH --gres=gpu:4 #SBATCH --time=12:00:00 #SBATCH --mem=200gb
- The number of GPUs per node is requested with the option "--gres=gpu:<number-of-gpus>"
- It is also possible to request a certain GPU type with the option "--gres=gpu:<gpu-type>:<number-of-gpus>". For <gpu-type> put the 'GPU Type' listed in the last line of the GPU Nodes Hardware table.
4 Interactive Jobs
Interactive jobs must NOT run on the logins nodes, however resources for interactive jobs can be requested using srun. The following example requests an interactive session on 1 core for 2 hours:
$ srun --partition=single --ntasks=1 --time=2:00:00 --pty /bin/bash
After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node.
If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with:
$ srun --partition=single --ntasks=1 --time=2:00:00 --x11 --pty /bin/bash
Once the walltime limit has been reached you will be automatically logged out from the compute node.
5 Job Monitoring
5.1 Information about submitted jobs
For an overview of your submitted jobs use the command:
To get detailed information about a specific jobs use the command:
$ scontrol show job <jobid>
5.2 Interactive access to running jobs
If you like to see what happens on the compute node(s), you can access the allocated resources of a running job with:
$ srun --jobid=[jobid] --pty /bin/bash
Commands like 'top' show you the most busy processes on the node. To exit 'top' type 'q'.
To monitor your GPU processes use the command 'nvidia-smi'.
In the case of multi node jobs, lookup the node names of your job with squeue and add one of the node names with the --nodelist option to the srun command:
$ srun --jobid=[jobid] --nodelist=[node-name] --pty /bin/bash
6 Job Feedback
Information will be provided soon.
7 Overview about free resources
On the login nodes the following command shows what resources are available for immediate use: