BinAC2/SLURM Partitions: Difference between revisions
| F Bartusch (talk | contribs) No edit summary | F Bartusch (talk | contribs)  No edit summary | ||
| Line 27: | Line 27: | ||
| |} | |} | ||
| === Quality of Service (QOS) === | |||
| A Quality of Service (QOS) limits the resources a user or group can use at the same time. Usually the QoS has the same name as the partitio it is associated with. You can list all QoS via: | |||
| <pre> | |||
| $ sacctmgr list qos --parsable | |||
| Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxTRESRunMinsPA|MaxTRESRunMinsPU|MaxJobsPA|MaxSubmitPA|MinTRES| | |||
| normal|0|00:00:00|||cluster|||1.000000|||||||||||||||||||| | |||
| gpu|0|00:00:00|||cluster|||1.000000|||||||||||gres/gpu:a100=4,gres/gpu:a30=8|8|||||||| | |||
| </pre> | |||
| The output shows that the <code>gpu</code> partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time. QoS may change in the future, please check the current values with the <code>sacctmgr list qos</code> command on the cluster. | |||
| === Parallel Jobs === | === Parallel Jobs === | ||
Revision as of 16:07, 19 December 2024
Partitions
The bwForCluster BinAC 2 provides two partitions (e.g. queues) for job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs).
| Partition | Node Access Policy | Node Types | Default | Limits | 
|---|---|---|---|---|
| compute (default) | shared | cpu | ntasks=1, time=00:10:00, mem-per-cpu=1gb | nodes=2, time=14-00:00:00 | 
| gpu | shared | gpu | ntasks=1, time=00:10:00, mem-per-cpu=1gb | nodes=1, time=14-00:00:00 | 
Quality of Service (QOS)
A Quality of Service (QOS) limits the resources a user or group can use at the same time. Usually the QoS has the same name as the partitio it is associated with. You can list all QoS via:
$ sacctmgr list qos --parsable Name|Priority|GraceTime|Preempt|PreemptExemptTime|PreemptMode|Flags|UsageThres|UsageFactor|GrpTRES|GrpTRESMins|GrpTRESRunMins|GrpJobs|GrpSubmit|GrpWall|MaxTRES|MaxTRESPerNode|MaxTRESMins|MaxWall|MaxTRESPU|MaxJobsPU|MaxSubmitPU|MaxTRESPA|MaxTRESRunMinsPA|MaxTRESRunMinsPU|MaxJobsPA|MaxSubmitPA|MinTRES| normal|0|00:00:00|||cluster|||1.000000|||||||||||||||||||| gpu|0|00:00:00|||cluster|||1.000000|||||||||||gres/gpu:a100=4,gres/gpu:a30=8|8||||||||
The output shows that the gpu partition will only run 8 jobs per user at the same time. A user can only use 4 A100 and 8 A30 GPUs at the same time. QoS may change in the future, please check the current values with the sacctmgr list qos command on the cluster.
Parallel Jobs
In order to submit parallel jobs to the InfiniBand part of the cluster, i.e., for fast inter-node communication, please select the appropriate nodes via the --constraint=ib option in your job script. For less demanding parallel jobs, you may try the --constraint=eth option, which utilizes 100Gb/s Ethernet instead of the low-latency 100Gb/s InfiniBand.
GPU Jobs
BinAC 2 provides different GPU models for computations. Please select the appropriate GPU type and the amount of GPUs with the --gres=aXX:N option in your job script
| GPU | GPU Memory | # GPUs per Node [N] | Submit Option | 
|---|---|---|---|
| Nvidia A30 | 24GB | 2 | --gres=gpu:a30:N | 
| Nvidia A100 | 80GB | 4 | --gres=gpu:a100:N |