BinAC2/SLURM Partitions: Difference between revisions
F Bartusch (talk | contribs) No edit summary |
(Added BinAC2 long partition) |
||
Line 1: | Line 1: | ||
== Partitions == |
== Partitions == |
||
The bwForCluster BinAC 2 provides |
The bwForCluster BinAC 2 provides four partitions for job submission. |
||
Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number and type of GPUs). |
Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number and type of GPUs). |
||
The <code>gpu</code> partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time. |
The <code>gpu</code> partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time. |
||
The <code>interactive</code> will only run 1 job per user at the same time. |
The <code>interactive</code> partition will only run 1 job per user at the same time. |
||
This partition is reserved is dedicated for testing things and using tools via a graphical user |
This partition is reserved is dedicated for testing things and using tools via a graphical user interface. |
||
The four nodes <code>node1-00[1-4]</code> are exclusively reserved for this partition. |
The four nodes <code>node1-00[1-4]</code> are exclusively reserved for this partition. |
||
You can run a VNC server in this partition. Please use <code>#SBATCH --gres=display:1</code> in your jobscript or <code>--gres=display:1</code> on the command line if you need a display. This ensures that your job starts on a node with "free" displays, because each of the four nodes only provide 20 possible virtual displays. |
You can run a VNC server in this partition. Please use <code>#SBATCH --gres=display:1</code> in your jobscript or <code>--gres=display:1</code> on the command line if you need a display. This ensures that your job starts on a node with "free" displays, because each of the four nodes only provide 20 possible virtual displays. |
||
The <code>long</code> partition is meant for long-running, parallel jobs. Please pack your jobs as dense as possible. If possible, do regular checkpointing in case the job fails after several days. Due to the small number of GPU nodes at BinAC2, we cannot offer a <code>long</code> partition with GPU nodes. |
|||
<!-- |
<!-- |
||
Line 41: | Line 43: | ||
| ntasks=1, time=00:10:00, mem-per-cpu=1gb |
| ntasks=1, time=00:10:00, mem-per-cpu=1gb |
||
| time=10:00:00</br>MaxJobsPerUser: 1 |
| time=10:00:00</br>MaxJobsPerUser: 1 |
||
|- |
|||
| long |
|||
| shared |
|||
| cpu (InfiniBand nodes only) |
|||
| time=1-00:00:00, feature=ib |
|||
| time=30-00:00:00</br>MaxNodes=10 |
|||
|- |
|- |
||
|} |
|} |
Latest revision as of 09:42, 18 September 2025
Partitions
The bwForCluster BinAC 2 provides four partitions for job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number and type of GPUs).
The gpu
partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time.
The interactive
partition will only run 1 job per user at the same time.
This partition is reserved is dedicated for testing things and using tools via a graphical user interface.
The four nodes node1-00[1-4]
are exclusively reserved for this partition.
You can run a VNC server in this partition. Please use #SBATCH --gres=display:1
in your jobscript or --gres=display:1
on the command line if you need a display. This ensures that your job starts on a node with "free" displays, because each of the four nodes only provide 20 possible virtual displays.
The long
partition is meant for long-running, parallel jobs. Please pack your jobs as dense as possible. If possible, do regular checkpointing in case the job fails after several days. Due to the small number of GPU nodes at BinAC2, we cannot offer a long
partition with GPU nodes.
Partition | Node Access Policy | Node Types | Default | Limits |
---|---|---|---|---|
compute (default) | shared | cpu | ntasks=1, time=00:10:00, mem-per-cpu=1gb | nodes=2, time=14-00:00:00 |
gpu | shared | gpu | ntasks=1, time=00:10:00, mem-per-cpu=1gb | time=14-00:00:00 MaxJobsPerUser: 8 MaxTRESPerUser: gres/gpu:a100=4, gres/gpu:a30=8, gres/gpu:h200=4 |
interactive | shared | cpu | ntasks=1, time=00:10:00, mem-per-cpu=1gb | time=10:00:00 MaxJobsPerUser: 1 |
long | shared | cpu (InfiniBand nodes only) | time=1-00:00:00, feature=ib | time=30-00:00:00 MaxNodes=10 |
Parallel Jobs
In order to submit parallel jobs to the InfiniBand part of the cluster, i.e., for fast inter-node communication, please select the appropriate nodes via the --constraint=ib
option in your job script. For less demanding parallel jobs, you may try the --constraint=eth
option, which utilizes 100Gb/s Ethernet instead of the low-latency 100Gb/s InfiniBand.
GPU Jobs
BinAC 2 provides different GPU models for computations. Please select the appropriate GPU type and the amount of GPUs with the --gres=aXX:N
option in your job script
GPU | GPU Memory | # GPUs per Node [N] | Submit Option |
---|---|---|---|
Nvidia A30 | 24GB | 2 | --gres=gpu:a30:N
|
Nvidia A100 | 80GB | 4 | --gres=gpu:a100:N
|
Nvidia H200 | 141GB | 4 | --gres=gpu:h200:N
|