DACHS/Queues: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 19: Line 19:
! colspan="5" | DACHS <br> sbatch -p ''partition''
! colspan="5" | DACHS <br> sbatch -p ''partition''
|- style="text-align:left;"
|- style="text-align:left;"
! queue !! node !! default resources !! maximum resources
! partition !! node !! default resources !! maximum resources
|- style="text-align:left"
|- style="text-align:left"
| gpu1
| gpu1
| gpu1[01-45]
| gpu1[01-45]
| time=30, mem-per-node=2500mb
| time=30, mem-per-node=5000mb
| time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
| time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
|- style="text-align:left;"
|- style="text-align:left;"
| gpu4
| gpu4
| gpu401
| gpu401
| time=30, mem-per-cpu=2500mb
| time=30, mem-per-cpu=5000mb
| time=72:00:00, nodes=1, mem=512000mb, ntasks-per-node=96
| time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
|- style="vertical-align:top; text-align:left"
|- style="vertical-align:top; text-align:left"
| gpu8
| gpu8
| gpu801
| gpu801
| time=30, mem-per-cpu=2500mb, cpus-per-gpu=8
| time=30, mem-per-cpu=5000mb, cpus-per-gpu=8
| time=48:00:00, mem=752000mb, ntasks-per-node=96
| time=48:00:00, mem=752000mb, ntasks-per-node=96
|-
|-
Line 46: Line 46:
#SBATCH --gres=gpu:1
#SBATCH --gres=gpu:1
#SBATCH --time=1:00:00
#SBATCH --time=1:00:00
#SBATCH --mail-type=all
#SBATCH --mail-user=my_email@hs-esslingen.de
module load devel/cuda/12.4
module load devel/cuda/12.4
cd $TMPDIR
cd $TMPDIR
Line 55: Line 57:


Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
Submitting <code>sbatch python_run.slurm</code> will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the '''fast''' scratch directory specified in the environment variable <code>TMPDIR</code>.
You '''have''' to allocate the GPU, otherwise You may not use it.
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in <code>my_requirements.txt</code>
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
It then copies the data directory in <code>my_data_dir</code> to this directory using <code>rsync</code>.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Finally, it executes your main python script, using the time command to figure out, how much time actually was used.
Alternatively you may time all the commands to get an estimate for Your next batch job.
Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.


The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.
The '''better''' your approximation, the better the Slurm scheduler may allocate resources to all users.

Latest revision as of 13:46, 18 December 2024

Partitions

DACHS offers three partitions in Slurm, which map directly to the node types: nodes with one NVIDIA L40S GPU, a node with 4 AMD MI300A APUs and the node with 8 NVIDIA H100 GPUs.

sinfo_t_idle

To see the available nodes, DACHS offers the tool sinfo_t_info, which any user may call.

sbatch -p partition

Batch jobs specify compute requirements, which must fit the resources as in maximum (wall-)time, memory and GPU resources. If You require a GPU, You must specify this with your request. These are restricted and must fit the available partitions. Since requested compute resources are NOT always automatically mapped to the correct queue class, you must add the correct queue class to your sbatch command . As with BwUniCluster 2.0, the specification of a partition is required.
Details are:

DACHS
sbatch -p partition
partition node default resources maximum resources
gpu1 gpu1[01-45] time=30, mem-per-node=5000mb time=72:00:00, nodes=16, mem-per-node=300000mb, res=gpu:1
gpu4 gpu401 time=30, mem-per-cpu=5000mb time=72:00:00, nodes=1, mem=500000mb, ntasks-per-node=96
gpu8 gpu801 time=30, mem-per-cpu=5000mb, cpus-per-gpu=8 time=48:00:00, mem=752000mb, ntasks-per-node=96

Default resources of a queue class defines time, #tasks and memory if not explicitly given with sbatch command. Resource list acronyms --time, --ntasks, --nodes, --mem and --mem-per-cpu.

A typical Slurm batch script (called for brevity python_run.slurm) for 1-node requiring one NVIDIA L40S GPU:

 #!/bin/bash
 #SBATCH --partition=gpu1
 #SBATCH --ntasks-per-gpu=48
 #SBATCH --gres=gpu:1
 #SBATCH --time=1:00:00
 #SBATCH --mail-type=all
 #SBATCH --mail-user=my_email@hs-esslingen.de
 module load devel/cuda/12.4
 cd $TMPDIR
 python3 -m venv my_environment
 . my_environment/bin/activate
 python3 -m pip install -r $HOME/my_requirements.txt
 rsync -avz $HOME/my_data_dir/ .
 time python3 $HOME/python_script.py

Submitting sbatch python_run.slurm will allocate one compute node and allocate the one available GPU for 1 hour. Furthermore, this will load the CUDA module version 12.4. It will then change to the fast scratch directory specified in the environment variable TMPDIR. You have to allocate the GPU, otherwise You may not use it. It will then follow Python's best practices and create a new Virtual Environment in that directory, then installing the dependencies of the projects detailed in my_requirements.txt It then copies the data directory in my_data_dir to this directory using rsync. Finally, it executes your main python script, using the time command to figure out, how much time actually was used. Alternatively you may time all the commands to get an estimate for Your next batch job.

Here, Slurm will email to the specified address upon start and completion of the job with a summary.

The better your approximation, the better the Slurm scheduler may allocate resources to all users.

Interactive usage

To get a good estimation of runtime, You may first want to try the resource interactively:

   srun --partition=gpu1 --ntasks-per-gpu=48 --gres=gpu1 --pty /bin/bash

Then You may execute the steps in python_run.slurm script interactively, noting differences and amend your Slurm batch script. Please note the --pty which forwards the standard output and takes standard input to allow working with the Shell.

Multiple nodes

Of course You may allocate multiple GPUs across nodes running:

   sbatch --nodes 4 ./python_run.slurm

Please be aware, that TMPDIR is still local. For the time being run from Your $HOME.


Nodes with multiple GPUs

The partitions gpu4 and gpu8 feature multiple GPUs. The gpu4 partition contains the node gpu401 featuring 4 AMD MI300A APUs with 128GB of memory each using ROCm. Please refer to the documentation on this node.

The gpu8 partition contains the node gpu401 featuring 4 AMD MI300A APUs with 128GB of memory each using ROCm. Please refer to the documentation on this node.