Helix/Slurm: Difference between revisions
S Richling (talk | contribs) |
S Richling (talk | contribs) |
||
(94 intermediate revisions by 2 users not shown) | |||
Line 11: | Line 11: | ||
|- |
|- |
||
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue |
| [https://slurm.schedmd.com/sbatch.html sbatch] || Submits a job and queues it in an input queue |
||
|- |
|||
| [https://slurm.schedmd.com/salloc.html saclloc] || Request resources for an interactive job |
|||
|- |
|- |
||
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs |
| [https://slurm.schedmd.com/squeue.html squeue] || Displays information about active, eligible, blocked, and/or recently completed jobs |
||
|- |
|- |
||
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information |
| [https://slurm.schedmd.com/scontrol.html scontrol] || Displays detailed job state information |
||
|- |
|||
| [https://slurm.schedmd.com/scontrol.html sstat] || Displays status information about a running job |
|||
|- |
|- |
||
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job |
| [https://slurm.schedmd.com/scancel.html scancel] || Cancels a job |
||
Line 30: | Line 34: | ||
<source lang='bash'> |
<source lang='bash'> |
||
#!/bin/bash |
#!/bin/bash |
||
#SBATCH --partition=single |
#SBATCH --partition=cpu-single |
||
#SBATCH --ntasks=1 |
#SBATCH --ntasks=1 |
||
#SBATCH --time=00:20:00 |
#SBATCH --time=00:20:00 |
||
Line 38: | Line 42: | ||
</source> |
</source> |
||
This jobs requests one core (--ntasks=1) and 1 GB memory (--mem=1gb) for 20 minutes (--time=00:20:00) on nodes provided by the partition 'single'. |
This jobs requests one core (--ntasks=1) and 1 GB memory (--mem=1gb) for 20 minutes (--time=00:20:00) on nodes provided by the partition 'cpu-single'. |
||
For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script. |
For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script. |
||
Line 44: | Line 48: | ||
== Partitions == |
== Partitions == |
||
On bwForCluster Helix |
On bwForCluster Helix it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested. |
||
The partitions devel and single are operated in shared mode, i.e. jobs from different users can run on the same node. Jobs can get exclusive access to compute nodes in these partitions with the "--exclusive" option. The partitions multi |
The partitions devel, cpu-single and gpu-single are operated in shared mode, i.e. jobs from different users can run on the same node. Jobs can get exclusive access to compute nodes in these partitions with the "--exclusive" option. The partitions cpu-multi and gpu-multi are operated in exclusive mode. Jobs in these partitions automatically get exclusive access to the requested compute nodes. |
||
GPUs are requested with the option "--gres=gpu:<number-of-gpus>". |
GPUs are requested with the option "--gres=gpu:<number-of-gpus>". |
||
Line 54: | Line 58: | ||
! style="width:20%"| Partition |
! style="width:20%"| Partition |
||
! style="width:20%"| Node Access Policy |
! style="width:20%"| Node Access Policy |
||
! style="width:20%"| [https://wiki.bwhpc.de/e/Helix/Hardware Node Types] |
! style="width:20%"| [https://wiki.bwhpc.de/e/Helix/Hardware#Compute_Nodes Node Types] |
||
! style="width:20%"| Default |
! style="width:20%"| Default |
||
! style="width:20%"| |
! style="width:20%"| Limits |
||
|- |
|- |
||
| devel |
| devel |
||
| shared |
| shared |
||
| cpu, |
| cpu, gpu4 |
||
| ntasks=1, time=00:10:00, mem-per-cpu=2gb |
| ntasks=1, time=00:10:00, mem-per-cpu=2gb |
||
| nodes= |
| nodes=2, time=00:30:00 |
||
|- |
|||
| cpu-single |
|||
| shared |
|||
| cpu, fat |
|||
| ntasks=1, time=00:30:00, mem-per-cpu=2gb |
|||
| nodes=1, time=120:00:00 |
|||
|- |
|- |
||
| single |
| gpu-single |
||
| shared |
| shared |
||
| |
| gpu4, gpu8 |
||
| ntasks=1, time=00:30:00, mem-per-cpu=2gb |
| ntasks=1, time=00:30:00, mem-per-cpu=2gb |
||
| nodes=1, time=120:00:00 |
| nodes=1, time=120:00:00 |
||
|- |
|- |
||
| multi |
| cpu-multi |
||
| job exclusive |
| job exclusive |
||
| cpu |
| cpu |
||
| nodes=2, time=00:30:00 |
| nodes=2, time=00:30:00 |
||
| nodes= |
| nodes=32, time=48:00:00 |
||
|- |
|||
| gpu-multi |
|||
| job exclusive |
|||
| gpu4 |
|||
| nodes=2, time=00:30:00 |
|||
| nodes=8, time=48:00:00 |
|||
|} |
|||
== Constraints == |
|||
It is possible to request explicitly the CPU manufacturer of compute nodes with the option "--constraint=<constraint_name>". |
|||
{| class="wikitable" |
|||
|- |
|||
! style="width:20%"| Constraint |
|||
! style="width:80%"| Meaning |
|||
|- |
|||
| amd |
|||
| request AMD nodes (default) |
|||
|- |
|||
| intel |
|||
| request Intel nodes (when available) |
|||
|} |
|} |
||
== Examples == |
== Examples == |
||
Here you can find some |
Here you can find some example scripts for batch jobs. |
||
=== Serial Programs === |
=== Serial Programs === |
||
<source lang="bash"> |
<source lang="bash"> |
||
#!/bin/bash |
|||
#SBATCH --partition=single |
|||
#SBATCH --partition=cpu-single |
|||
#SBATCH --ntasks=1 |
#SBATCH --ntasks=1 |
||
#SBATCH --time= |
#SBATCH --time=20:00:00 |
||
#SBATCH --mem=4gb |
#SBATCH --mem=4gb |
||
./my_serial_program |
|||
</source> |
</source> |
||
'''Notes:''' |
'''Notes:''' |
||
* Jobs with "--mem" |
* Jobs with "--mem" up to 236gb can run on all node types associated with the single partition. |
||
=== Multi-threaded Programs === |
=== Multi-threaded Programs === |
||
<source lang="bash"> |
<source lang="bash"> |
||
#!/bin/bash |
|||
#SBATCH --partition=single |
|||
#SBATCH --partition=cpu-single |
|||
#SBATCH --nodes=1 |
#SBATCH --nodes=1 |
||
#SBATCH --ntasks-per-node= |
#SBATCH --ntasks-per-node=1 |
||
#SBATCH --cpus-per-task=16 |
|||
#SBATCH --time=01:30:00 |
#SBATCH --time=01:30:00 |
||
#SBATCH --mem=50gb |
#SBATCH --mem=50gb |
||
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK} |
|||
./my_multithreaded_program |
|||
</source> |
</source> |
||
'''Notes:''' |
'''Notes:''' |
||
* Jobs with "--ntasks-per-node" up to 64 and "--mem" |
* Jobs with "--ntasks-per-node" up to 64 and "--mem" up to 236gb can run on all node types associated with the single partition. |
||
* With "export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}" you can set the number of threads according to the number of resources requested. |
|||
=== MPI Programs === |
=== MPI Programs === |
||
<source lang="bash"> |
<source lang="bash"> |
||
#!/bin/bash |
|||
#SBATCH --partition=multi |
|||
#SBATCH --partition=cpu-multi |
|||
#SBATCH --nodes=2 |
#SBATCH --nodes=2 |
||
#SBATCH --ntasks-per-node=64 |
#SBATCH --ntasks-per-node=64 |
||
#SBATCH --time=12:00:00 |
#SBATCH --time=12:00:00 |
||
#SBATCH --mem=50gb |
#SBATCH --mem=50gb |
||
module load compiler/gnu |
|||
module load mpi/openmpi |
|||
srun ./my_mpi_program |
|||
</source> |
</source> |
||
'''Notes:''' |
'''Notes:''' |
||
* "--mem" requests the memory per node. |
* "--mem" requests the memory per node. The maximum is 236gb. |
||
* The Compiler and MPI modules used for the compilation must be loaded before the start of the program. |
|||
* It is recommended to start MPI programs with 'srun'. |
|||
=== GPU Programs === |
=== GPU Programs === |
||
<source lang="bash"> |
<source lang="bash"> |
||
#!/bin/bash |
|||
#SBATCH --partition=single |
|||
#SBATCH --partition=gpu-single |
|||
#SBATCH --nodes=1 |
#SBATCH --nodes=1 |
||
#SBATCH --ntasks-per-node=40 |
#SBATCH --ntasks-per-node=40 |
||
Line 128: | Line 174: | ||
#SBATCH --time=12:00:00 |
#SBATCH --time=12:00:00 |
||
#SBATCH --mem=200gb |
#SBATCH --mem=200gb |
||
module load devel/cuda |
|||
export OMP_NUM_THREADS=${SLURM_NTASKS} |
|||
./my_cuda_program |
|||
</source> |
</source> |
||
'''Notes:''' |
'''Notes:''' |
||
* The number of GPUs per node is requested with the option "--gres=gpu:<number-of-gpus>" |
* The number of GPUs per node is requested with the option "--gres=gpu:<number-of-gpus>" |
||
* It is |
* It is recommended to request a suitable GPU type for your application with the option "--gres=gpu:<gpu-type>:<number-of-gpus>". For <gpu-type> put the 'GPU Type' listed in the last line of the [https://wiki.bwhpc.de/e/Helix/Hardware#Compute_Nodes Compute Nodes table]. |
||
** Example for a request of two A40 GPUs: --gres=gpu:A40:2 |
|||
** Example for a request of one A100 GPU: --gres=gpu:A100:1 |
|||
* If you are unsure, if your code will run faster on a A40 or a A100, please run a test case and compare the run times. In general the following applies: |
|||
** A40 GPUs are optimized for single precision computations. |
|||
** A100 GPUs offer better performance for double precision computations or if the code makes use of tensor cores. |
|||
* The Cuda module used for compilation must be loaded before the start of the program. |
|||
=== More examples === |
|||
Further batch script examples are available on bwForCluster Helix in the directory: <code>/opt/bwhpc/common/system/slurm-examples</code> |
|||
= Interactive Jobs = |
= Interactive Jobs = |
||
Interactive jobs must NOT run on the |
Interactive jobs must NOT run on the login nodes, however resources for interactive jobs can be requested using srun. The following example requests an interactive session on 1 core for 2 hours: |
||
<source lang=bash>$ salloc --partition=single --ntasks=1 --time=2:00:00 </source> |
<source lang=bash>$ salloc --partition=cpu-single --ntasks=1 --time=2:00:00 </source> |
||
After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node. |
After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node. |
||
Line 145: | Line 203: | ||
If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with: |
If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with: |
||
<source lang=bash>$ salloc --partition=single --ntasks=1 --time=2:00:00 --x11 </source> |
<source lang=bash>$ salloc --partition=cpu-single --ntasks=1 --time=2:00:00 --x11 </source> |
||
Once the walltime limit has been reached you will be automatically logged out from the compute node. |
Once the walltime limit has been reached you will be automatically logged out from the compute node. |
||
Line 157: | Line 215: | ||
<source lang=bash>$ squeue</source> |
<source lang=bash>$ squeue</source> |
||
To get detailed information about a specific |
To get detailed information about a specific job use the command: |
||
<source lang=bash>$ scontrol show job <jobid></source> |
<source lang=bash>$ scontrol show job <jobid></source> |
||
== Information about resource usage of running jobs == |
|||
You can monitor the resource usage of running jobs with the sstat command. For example: |
|||
<pre> |
|||
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j <jobid> |
|||
</pre> |
|||
This will show average CPU time, average and maximum memory consumption of all tasks in the running job. |
|||
'sstat -e' command shows a list of fields that can be specified with the '--format' option. |
|||
== Interactive access to running jobs == |
== Interactive access to running jobs == |
||
If you like to see what happens on the compute node(s), you can access the allocated resources of a running job with: |
|||
It is also possible to attach an interactive shell to a running job with command: |
|||
<source lang=bash>$ srun --jobid=[jobid] --pty /bin/bash</source> |
|||
<source lang=bash>$ srun --jobid=<jobid> --overlap --pty /bin/bash</source> |
|||
Commands like 'top' show you the most busy processes on the node. To exit 'top' type 'q'. |
Commands like 'top' show you the most busy processes on the node. To exit 'top' type 'q'. |
||
Line 170: | Line 241: | ||
To monitor your GPU processes use the command 'nvidia-smi'. |
To monitor your GPU processes use the command 'nvidia-smi'. |
||
== Job Feedback == |
|||
In the case of multi node jobs, lookup the node names of your job with squeue and add one of the node names with the --nodelist option to the srun command: |
|||
You get feedback on resource usage and job efficiency for completed jobs with the command: |
|||
<source lang=bash> |
|||
$ seff <jobid> |
|||
</source> |
|||
Example Output: |
|||
<source lang=bash> |
|||
============================= JOB FEEDBACK ============================= |
|||
Job ID: 12345678 |
|||
Cluster: helix |
|||
User/Group: hd_ab123/hd_hd |
|||
State: COMPLETED (exit code 0) |
|||
Nodes: 2 |
|||
Cores per node: 64 |
|||
CPU Utilized: 3-04:11:46 |
|||
CPU Efficiency: 97.90% of 3-05:49:52 core-walltime |
|||
Job Wall-clock time: 00:36:29 |
|||
Memory Utilized: 432.74 GB (estimated maximum) |
|||
Memory Efficiency: 85.96% of 503.42 GB (251.71 GB/node) |
|||
</source> |
|||
Explanation: |
|||
* Nodes: Number of allocated nodes for the job. |
|||
* Cores per node: Number of physical cores per node allocated for the job. |
|||
* CPU Utilized: Sum of utilized core time. |
|||
* CPU Efficiency: 'CPU Utilized' with respect to core-walltime (= 'Nodes' x 'Cores per node' x 'Job Wall-clock time') in percent. |
|||
* Job Wall-clock time: runtime of the job. |
|||
* Memory Utilized: Sum of memory used. For multi node MPI jobs the sum is only correct when srun is used instead of mpirun. |
|||
* Memory Efficiency: 'Memory Utilized' with respect to total allocated memory for the job. |
|||
== Job Monitoring Portal == |
|||
For more detailed information about your jobs visit the job monitoring portal: https://helix-monitoring.bwservices.uni-heidelberg.de |
|||
= Accounting = |
|||
Jobs are billed for allocated CPU cores, memory and GPUs. |
|||
To see the accounting data of a specific job: |
|||
<source lang=bash>$ sacct -j <jobid> --format=user,jobid,account,nnodes,ncpus,time,elapsed,AllocTRES%50</source> |
|||
To retrive the job history for a specific user for a certain time frame: |
|||
<source lang=bash>$ srun --jobid=[jobid] --nodelist=[node-name] --pty /bin/bash</source> |
|||
<source lang=bash>$ sacct -u <user> -S 2022-08-20 -E 2022-08-30 --format=user,jobid,account,nnodes,ncpus,time,elapsed,AllocTRES%50</source> |
|||
= Job Feedback = |
|||
= Overview about free resources = |
= Overview about free resources = |
Latest revision as of 16:06, 17 October 2024
General information about Slurm
The bwForCluster Helix uses Slurm as batch system.
- Slurm documentation: https://slurm.schedmd.com/documentation.html
- Slurm cheat sheet: https://slurm.schedmd.com/pdfs/summary.pdf
- Slurm tutorials: https://slurm.schedmd.com/tutorials.html
Slurm Command Overview
Slurm commands | Brief explanation |
---|---|
sbatch | Submits a job and queues it in an input queue |
saclloc | Request resources for an interactive job |
squeue | Displays information about active, eligible, blocked, and/or recently completed jobs |
scontrol | Displays detailed job state information |
sstat | Displays status information about a running job |
scancel | Cancels a job |
Job Submission
Batch jobs are submitted with the command:
$ sbatch <job-script>
A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:
#!/bin/bash
#SBATCH --partition=cpu-single
#SBATCH --ntasks=1
#SBATCH --time=00:20:00
#SBATCH --mem=1gb
#SBATCH --export=NONE
echo 'Hello world'
This jobs requests one core (--ntasks=1) and 1 GB memory (--mem=1gb) for 20 minutes (--time=00:20:00) on nodes provided by the partition 'cpu-single'.
For the sake of a better reproducibility of jobs it is recommended to use the option --export=NONE to prevent the propagation of environment variables from the submit session into the job environment and to load required software modules in the job script.
Partitions
On bwForCluster Helix it is necessary to request a partition with '--partition=<partition_name>' on job submission. Within a partition job allocations are routed automatically to the most suitable compute node(s) for the requested resources (e.g. amount of nodes and cores, memory, number of GPUs). The devel partition is the default partition, if no partition is requested.
The partitions devel, cpu-single and gpu-single are operated in shared mode, i.e. jobs from different users can run on the same node. Jobs can get exclusive access to compute nodes in these partitions with the "--exclusive" option. The partitions cpu-multi and gpu-multi are operated in exclusive mode. Jobs in these partitions automatically get exclusive access to the requested compute nodes.
GPUs are requested with the option "--gres=gpu:<number-of-gpus>".
Partition | Node Access Policy | Node Types | Default | Limits |
---|---|---|---|---|
devel | shared | cpu, gpu4 | ntasks=1, time=00:10:00, mem-per-cpu=2gb | nodes=2, time=00:30:00 |
cpu-single | shared | cpu, fat | ntasks=1, time=00:30:00, mem-per-cpu=2gb | nodes=1, time=120:00:00 |
gpu-single | shared | gpu4, gpu8 | ntasks=1, time=00:30:00, mem-per-cpu=2gb | nodes=1, time=120:00:00 |
cpu-multi | job exclusive | cpu | nodes=2, time=00:30:00 | nodes=32, time=48:00:00 |
gpu-multi | job exclusive | gpu4 | nodes=2, time=00:30:00 | nodes=8, time=48:00:00 |
Constraints
It is possible to request explicitly the CPU manufacturer of compute nodes with the option "--constraint=<constraint_name>".
Constraint | Meaning |
---|---|
amd | request AMD nodes (default) |
intel | request Intel nodes (when available) |
Examples
Here you can find some example scripts for batch jobs.
Serial Programs
#!/bin/bash
#SBATCH --partition=cpu-single
#SBATCH --ntasks=1
#SBATCH --time=20:00:00
#SBATCH --mem=4gb
./my_serial_program
Notes:
- Jobs with "--mem" up to 236gb can run on all node types associated with the single partition.
Multi-threaded Programs
#!/bin/bash
#SBATCH --partition=cpu-single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --time=01:30:00
#SBATCH --mem=50gb
export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}
./my_multithreaded_program
Notes:
- Jobs with "--ntasks-per-node" up to 64 and "--mem" up to 236gb can run on all node types associated with the single partition.
- With "export OMP_NUM_THREADS=${SLURM_CPUS_PER_TASK}" you can set the number of threads according to the number of resources requested.
MPI Programs
#!/bin/bash
#SBATCH --partition=cpu-multi
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=64
#SBATCH --time=12:00:00
#SBATCH --mem=50gb
module load compiler/gnu
module load mpi/openmpi
srun ./my_mpi_program
Notes:
- "--mem" requests the memory per node. The maximum is 236gb.
- The Compiler and MPI modules used for the compilation must be loaded before the start of the program.
- It is recommended to start MPI programs with 'srun'.
GPU Programs
#!/bin/bash
#SBATCH --partition=gpu-single
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=40
#SBATCH --gres=gpu:4
#SBATCH --time=12:00:00
#SBATCH --mem=200gb
module load devel/cuda
export OMP_NUM_THREADS=${SLURM_NTASKS}
./my_cuda_program
Notes:
- The number of GPUs per node is requested with the option "--gres=gpu:<number-of-gpus>"
- It is recommended to request a suitable GPU type for your application with the option "--gres=gpu:<gpu-type>:<number-of-gpus>". For <gpu-type> put the 'GPU Type' listed in the last line of the Compute Nodes table.
- Example for a request of two A40 GPUs: --gres=gpu:A40:2
- Example for a request of one A100 GPU: --gres=gpu:A100:1
- If you are unsure, if your code will run faster on a A40 or a A100, please run a test case and compare the run times. In general the following applies:
- A40 GPUs are optimized for single precision computations.
- A100 GPUs offer better performance for double precision computations or if the code makes use of tensor cores.
- The Cuda module used for compilation must be loaded before the start of the program.
More examples
Further batch script examples are available on bwForCluster Helix in the directory: /opt/bwhpc/common/system/slurm-examples
Interactive Jobs
Interactive jobs must NOT run on the login nodes, however resources for interactive jobs can be requested using srun. The following example requests an interactive session on 1 core for 2 hours:
$ salloc --partition=cpu-single --ntasks=1 --time=2:00:00
After execution of this command wait until the queueing system has granted you the requested resources. Once granted you will be automatically logged on the allocated compute node.
If you use applications or tools which provide a GUI, enable X-forwarding for your interactive session with:
$ salloc --partition=cpu-single --ntasks=1 --time=2:00:00 --x11
Once the walltime limit has been reached you will be automatically logged out from the compute node.
Job Monitoring
Information about submitted jobs
For an overview of your submitted jobs use the command:
$ squeue
To get detailed information about a specific job use the command:
$ scontrol show job <jobid>
Information about resource usage of running jobs
You can monitor the resource usage of running jobs with the sstat command. For example:
$ sstat --format=JobId,AveCPU,AveRSS,MaxRSS -j <jobid>
This will show average CPU time, average and maximum memory consumption of all tasks in the running job.
'sstat -e' command shows a list of fields that can be specified with the '--format' option.
Interactive access to running jobs
It is also possible to attach an interactive shell to a running job with command:
$ srun --jobid=<jobid> --overlap --pty /bin/bash
Commands like 'top' show you the most busy processes on the node. To exit 'top' type 'q'.
To monitor your GPU processes use the command 'nvidia-smi'.
Job Feedback
You get feedback on resource usage and job efficiency for completed jobs with the command:
$ seff <jobid>
Example Output:
============================= JOB FEEDBACK =============================
Job ID: 12345678
Cluster: helix
User/Group: hd_ab123/hd_hd
State: COMPLETED (exit code 0)
Nodes: 2
Cores per node: 64
CPU Utilized: 3-04:11:46
CPU Efficiency: 97.90% of 3-05:49:52 core-walltime
Job Wall-clock time: 00:36:29
Memory Utilized: 432.74 GB (estimated maximum)
Memory Efficiency: 85.96% of 503.42 GB (251.71 GB/node)
Explanation:
- Nodes: Number of allocated nodes for the job.
- Cores per node: Number of physical cores per node allocated for the job.
- CPU Utilized: Sum of utilized core time.
- CPU Efficiency: 'CPU Utilized' with respect to core-walltime (= 'Nodes' x 'Cores per node' x 'Job Wall-clock time') in percent.
- Job Wall-clock time: runtime of the job.
- Memory Utilized: Sum of memory used. For multi node MPI jobs the sum is only correct when srun is used instead of mpirun.
- Memory Efficiency: 'Memory Utilized' with respect to total allocated memory for the job.
Job Monitoring Portal
For more detailed information about your jobs visit the job monitoring portal: https://helix-monitoring.bwservices.uni-heidelberg.de
Accounting
Jobs are billed for allocated CPU cores, memory and GPUs.
To see the accounting data of a specific job:
$ sacct -j <jobid> --format=user,jobid,account,nnodes,ncpus,time,elapsed,AllocTRES%50
To retrive the job history for a specific user for a certain time frame:
$ sacct -u <user> -S 2022-08-20 -E 2022-08-30 --format=user,jobid,account,nnodes,ncpus,time,elapsed,AllocTRES%50
Overview about free resources
On the login nodes the following command shows what resources are available for immediate use:
$ sinfo_t_idle