DACHS/Queues: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
 
(5 intermediate revisions by the same user not shown)
Line 12: Line 12:
These are restricted and must fit the available '''partitions'''.
These are restricted and must fit the available '''partitions'''.
Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''.
Since requested compute resources are NOT always automatically mapped to the correct queue class, '''you must add the correct queue class to your sbatch command '''.
<font color=red>As with BwUniCluster 2.0, the specification of a partition is required.</font>
<font color=red>As with bwUniCluster, the specification of a partition is required.</font>
<br>
<br>
Details are:
Details are:
Line 19: Line 19:
! colspan="5" | DACHS <br> sbatch -p ''partition''
! colspan="5" | DACHS <br> sbatch -p ''partition''
|- style="text-align:left;"
|- style="text-align:left;"
! queue !! node !! default resources !! maximum resources
! partition !! node !! default resources !! maximum resources
|- style="text-align:left"
|- style="text-align:left"
| gpu1
| gpu1
Line 29: Line 29:
| gpu401
| gpu401
| time=30, mem-per-cpu=5000mb
| time=30, mem-per-cpu=5000mb
| time=72:00:00, nodes=1, mem=512000mb, ntasks-per-node=96
| time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
|- style="vertical-align:top; text-align:left"
|- style="vertical-align:top; text-align:left"
| gpu8
| gpu8
Line 46: Line 46:
#SBATCH --gres=gpu:1
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --time=1:00:00
#SBATCH --mail-type=all
#SBATCH --mail-user=my_email@hs-esslingen.de
module load devel/cuda/12.4
module load devel/cuda/12.4
cd $TMPDIR
cd $TMPDIR
Line 55: Line 57:


Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
You '''have''' to allocate the GPU, otherwise You may not use it.
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Alternatively you may time all the commands to get an estimate for Your next batch job.
Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.


The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.
The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.
Line 64: Line 69:
== Interactive usage ==
== Interactive usage ==
To '''get a good estimation''' of runtime, You may first want to try the resource ''interactively'':
To '''get a good estimation''' of runtime, You may first want to try the resource ''interactively'':
srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu1 --pty /bin/bash
srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu:1 --pty /bin/bash


Then You may execute the steps in <code>python_run.slurm</code> script interactively, noting differences and amend your Slurm batch script.
Then You may execute the steps in <code>python_run.slurm</code> script interactively, noting differences and amend your Slurm batch script.
Line 72: Line 77:
Of course You may allocate multiple GPUs across nodes running:
Of course You may allocate multiple GPUs across nodes running:
sbatch --nodes 4 ./python_run.slurm
sbatch --nodes 4 ./python_run.slurm
Please be aware, that TMPDIR is still local. For the time being run from Your $HOME.
Please be aware, that TMPDIR is still local. For the time being run from Your $HOME or better yet from an allocated [[Workspace]].



== Nodes with multiple GPUs ==
== Nodes with multiple GPUs ==
The partitions <code>gpu4</code> and <code>gpu8</code> feature multiple GPUs.
The partitions <code>gpu4</code> and <code>gpu8</code> feature multiple GPUs.
The <code>gpu4</code> partition contains the node <code>gpu401</code> featuring 4 AMD MI300A APUs with 128GB of memory each using ROCm.
The <code>gpu4</code> partition contains the node <code>gpu401</code> featuring 4x AMD MI300A APUs each with 128GB of fast HMB3e memory shared between the 24 cores and the GPU.
Please refer to the documentation on this node.
You may use AMD's ROCm employing HIP, OpenACC or OpenCL to parallelize for the GPU. Please refer to the documentation on this node.


The <code>gpu8</code> partition contains the node <code>gpu401</code> featuring 4 AMD MI300A APUs with 128GB of memory each using ROCm.
The <code>gpu8</code> partition contains the node <code>gpu801</code> featuring 8x NVIDIA H100 offering 80GB of VRAM each, interconnected using SXM5.
Please refer to the documentation on this node.
Please refer to the documentation on this node.

Latest revision as of 10:46, 14 May 2025

Partitions

DACHS offers three partitions in Slurm, which map directly to the node types: nodes with one NVIDIA L40S GPU, a node with 4 AMD MI300A APUs and the node with 8 NVIDIA H100 GPUs.

sinfo_t_idle

To see the available nodes, DACHS offers the tool sinfo_t_info, which any user may call.

sbatch -p partition

Batch jobs specify compute requirements, which must fit the resources as in maximum (wall-)time, memory and GPU resources. If You require a GPU, You must specify this with your request. These are restricted and must fit the available partitions. Since requested compute resources are NOT always automatically mapped to the correct queue class, you must add the correct queue class to your sbatch command . As with bwUniCluster, the specification of a partition is required.
Details are:

DACHS
sbatch -p partition
partition node default resources maximum resources
gpu1 gpu1[01-45] time=30, mem-per-node=5000mb time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
gpu4 gpu401 time=30, mem-per-cpu=5000mb time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
gpu8 gpu801 time=30, mem-per-cpu=5000mb, cpus-per-gpu=8 time=48:00:00, mem=752000mb, ntasks-per-node=96

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms --time, --ntasks, --nodes, --mem and --mem-per-cpu.

A typical Slurm batch script (called for brevity python_run.slurm) for 1-node requiring one NVIDIA L40S GPU:

 #!/bin/bash
 #SBATCH --partition=gpu1
 #SBATCH --ntasks-per-gpu=48
 #SBATCH --gres=gpu:1
 #SBATCH --time=1:00:00
 #SBATCH --mail-type=all
 #SBATCH --mail-user=my_email@hs-esslingen.de
 module load devel/cuda/12.4
 cd $TMPDIR
 python3 -m venv my_environment
 . my_environment/bin/activate
 python3 -m pip install -r $HOME/my_requirements.txt
 rsync -avz $HOME/my_data_dir/ .
 time python3 $HOME/python_script.py

Submitting sbatch python_run.slurm will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the fast scratch directory specified in the environment variable TMPDIR. You have to allocate the GPU, otherwise You may not use it. It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in my_requirements.txt It then copies the data directory in my_data_dir to this directory using rsync. Finally, it executes your main python script, using the time command to figure out, how much time actually was used. Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.

The better your approximation, the better the Slurm scheduler may allocate resources to all users.

Interactive usage

To get a good estimation of runtime, You may first want to try the resource interactively:

   srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu:1 --pty /bin/bash

Then You may execute the steps in python_run.slurm script interactively, noting differences and amend your Slurm batch script. Please note the --pty which forwards the standard output and takes standard input to allow working with the Shell.

Multiple nodes

Of course You may allocate multiple GPUs across nodes running:

   sbatch --nodes 4 ./python_run.slurm

Please be aware, that TMPDIR is still local. For the time being run from Your $HOME or better yet from an allocated Workspace.

Nodes with multiple GPUs

The partitions gpu4 and gpu8 feature multiple GPUs. The gpu4 partition contains the node gpu401 featuring 4x AMD MI300A APUs each with 128GB of fast HMB3e memory shared between the 24 cores and the GPU. You may use AMD's ROCm employing HIP, OpenACC or OpenCL to parallelize for the GPU. Please refer to the documentation on this node.

The gpu8 partition contains the node gpu801 featuring 8x NVIDIA H100 offering 80GB of VRAM each, interconnected using SXM5. Please refer to the documentation on this node.