BinAC2/SLURM Partitions: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
== Partitions ==
== Partitions ==


The bwForCluster BinAC 2 provides two partitions (e.g. queues) for job submission.
The bwForCluster BinAC 2 provides two partitions for job submission.
Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs).
Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number and type of GPUs).


The <code>gpu</code> partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time.
The <code>gpu</code> partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time.

The <code>interactive</code> will only run 1 job per user at the same time.
This partition is reserved is dedicated for testing things and using tools via a graphical user interfae.
The four nodes <code>node1-00[1-4]</code> are exclusively reserved for this partition.
You can run a VNC server in this partition. Please use <code>#SBATCH --gres=display:1</code> in your jobscript or <code>--gres=display:1</code> on the command line if you need a display. This ensures that your job starts on a node with "free" displays, because each of the four nodes only provide 20 possible virtual displays.

<!--
<!--
All partitions are operated in shared mode, that is, jobs from different users can be executed on the same node. However, one can get exclusive access to compute nodes by using the "--exclusive" option.
All partitions are operated in shared mode, that is, jobs from different users can be executed on the same node. However, one can get exclusive access to compute nodes by using the "--exclusive" option.
Line 26: Line 32:
| gpu
| gpu
| ntasks=1, time=00:10:00, mem-per-cpu=1gb
| ntasks=1, time=00:10:00, mem-per-cpu=1gb
| time=14-00:00:00</br>MaxJobsPerUser: 8</br>MaxTRESPerUser: <code>gres/gpu:a100=4,gres/gpu:a30=8</code>
| time=14-00:00:00</br>MaxJobsPerUser: 8</br>MaxTRESPerUser:</br><pre>gres/gpu:a100=4,
gres/gpu:a30=8,
gres/gpu:h200=4</pre>
|-
| interactive
| shared
| cpu
| ntasks=1, time=00:10:00, mem-per-cpu=1gb
| time=10:00:00</br>MaxJobsPerUser: 1
|-
|-
|}
|}
Line 54: Line 68:
| 4
| 4
| <code>--gres=gpu:a100:N</code>
| <code>--gres=gpu:a100:N</code>
|-
| Nvidia H200
| 141GB
| 4
| <code>--gres=gpu:h200:N</code>
|-
|-
|}
|}

Latest revision as of 14:38, 20 August 2025

Partitions

The bwForCluster BinAC 2 provides two partitions for job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number and type of GPUs).

The gpu partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time.

The interactive will only run 1 job per user at the same time. This partition is reserved is dedicated for testing things and using tools via a graphical user interfae. The four nodes node1-00[1-4] are exclusively reserved for this partition. You can run a VNC server in this partition. Please use #SBATCH --gres=display:1 in your jobscript or --gres=display:1 on the command line if you need a display. This ensures that your job starts on a node with "free" displays, because each of the four nodes only provide 20 possible virtual displays.

Partition Node Access Policy Node Types Default Limits
compute (default) shared cpu ntasks=1, time=00:10:00, mem-per-cpu=1gb nodes=2, time=14-00:00:00
gpu shared gpu ntasks=1, time=00:10:00, mem-per-cpu=1gb time=14-00:00:00
MaxJobsPerUser: 8
MaxTRESPerUser:
gres/gpu:a100=4,
gres/gpu:a30=8,
gres/gpu:h200=4
interactive shared cpu ntasks=1, time=00:10:00, mem-per-cpu=1gb time=10:00:00
MaxJobsPerUser: 1

Parallel Jobs

In order to submit parallel jobs to the InfiniBand part of the cluster, i.e., for fast inter-node communication, please select the appropriate nodes via the --constraint=ib option in your job script. For less demanding parallel jobs, you may try the --constraint=eth option, which utilizes 100Gb/s Ethernet instead of the low-latency 100Gb/s InfiniBand.

GPU Jobs

BinAC 2 provides different GPU models for computations. Please select the appropriate GPU type and the amount of GPUs with the --gres=aXX:N option in your job script

GPU GPU Memory # GPUs per Node [N] Submit Option
Nvidia A30 24GB 2 --gres=gpu:a30:N
Nvidia A100 80GB 4 --gres=gpu:a100:N
Nvidia H200 141GB 4 --gres=gpu:h200:N