Running Calculations: Difference between revisions

Latest revision as of 10:32, 14 July 2026

← This page is used in the HPC Glossary to explain the term "Batch Scheduler" and "Batch System"

Life Cycle of a Calculation (Job)

On your desktop computer you start your calculations and they start immediately, running until they are finished. Then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:

prepare a script (a set commands to run - usually as a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this batch script has a header section, in which you specify details like required compute cores (processing units witin a computer), estimated runtime, memory requirements, disk space needed, etc.
Submit the script into a queue, where your job (calculation)
gets asigned an inital priority, is queued and waits in row with other compute jobs until the resources you requested in the header become available. (Requested time is also a resource!)
Execution: Once a suitable resource slot is available and your job is in the front of the queue of suitable jobs (suitable with regards to the resource slot), your script is executed on (a) compute node(s). Your calculation runs on that/those node(s) until it is finished or reaches the specified time limit.
Save results: Include commands to save the calculation results back to a long term storage (e.g. your home directory), at least at the end of your script. What you have not saved until the job finishes won't be saved!
If your job reaches the specified time limit, all your running processes will be killed and the resources get cleared. So any data that has not been saved will be lost!

The software that distributes jobs on compute nodes is called a batch system or batch scheduler. The software currently used as a batch system on bwHPC clusters is "Slurm".

Learn more about the functioning of job distribution in

→ batch system

Example Jobs

For most software that a bwHPC project installed on the cluster, we have prepared an example job script running some example calculation with that exact software.

How to access these examples is described in the "Software job examples" section of the page

→ Environment Modules

Job Script and Job Submission

Batch jobs are submitted with the command:

$ sbatch <job-script>

A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:

#!/bin/bash
#SBATCH --partition=<cluster specific>
#SBATCH --ntasks-per-node=8 
#SBATCH --time=00:05:00
#SBATCH --mem=1gb
#SBATCH --export=NONE
echo 'Here would start your calculation script'

Please see cluster specific examples from the links in the next section, because exact options needed differ by cluster.

Link to Batch System and Examples per Cluster

Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documentation:

→ Slurm bwUniCluster 3.0

→ Slurm JUSTUS 2

→ Slurm Helix

→ Slurm NEMO2

→ Slurm BinAC2

How To Run Jobs Efficiently

When you are running your calculations, you will have to decide on how many compute-cores your job will be simultaneously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead.

How to find a reasonable number of how many compute cores to use for your calculation can be found under

→ Scaling

Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for

→ Efficient Cluster Usage

To optimize the job efficiency, it is very helpful to check which resources a job has actually used.

→ Job Monitoring

@@ Line 1: / Line 1: @@
+← This page is used in the [[HPC Glossary]] to explain the term "Batch Scheduler" and "Batch System"
-== Description ==
+== Life Cycle of a Calculation (Job) ==
 [[File:running_calculations_on_cluster.svg|thumb|upright=0.4]]
-On your desktop computer, you start your calculations and they start immediately, run until they are finished, then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:
+On your desktop computer you start your calculations and they start immediately, running until they are finished. Then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:
-# prepare a [[script]] (usually a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this ''[[batch script]]'' has a header section, in which you specify details like required [[compute core]s, [[estimated runtime]], [[memory requirements]], disk space needed, etc.
+# prepare a script (a set commands to run - usually as a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this ''[[batch script]]'' has a header section, in which you specify details like required compute cores (processing units witin a computer), estimated runtime, memory requirements, disk space needed, etc.
-# ''[[Submit]]'' the script into a [[queue]], where your ''[[job]]'' (calculation)
+# ''Submit'' the script into a queue, where your ''job'' (calculation)
-# Job is queued and waits in row with other compute jobs until the resources you requested in the header become available.
+# gets asigned an inital priority, is queued and waits in row with other compute jobs until the resources you requested in the header become available. (Requested time is also a resource!)
-# Execution: Once your job reaches the front of the queue, your script is executed on a compute node. Your calculation runs on that node until it is finished or reaches the specified time limit.
+# Execution: Once a suitable resource slot is available and your job is in the front of the queue of suitable jobs (suitable with regards to the resource slot), your script is executed on (a) compute node(s). Your calculation runs on that/those node(s) until it is finished or reaches the specified time limit.
-# Save results: At the end of your script, include commands to save the calculation results back to your home directory.
+# Save results: Include commands to save the calculation results back to a long term storage (e.g. your home directory), at least at the end of your script. What you have not saved until the job finishes won't be saved!
+# If your job reaches the specified time limit, all your running processes will be killed and the resources get cleared. So any data that has not been saved will be lost!
-There are two types of [[batch system]]s currently used on bwHPC clusters, called "[[Moab]]" (legacy installs) and "[[Slurm]]".
+The software that distributes jobs on compute nodes is called a '''[[batch system]]''' or '''batch scheduler'''. The software currently used as a [[batch system]] on bwHPC clusters is "Slurm".
+Learn more about the functioning of job distribution in
-== Link to Batch System per Cluster ==
+&rarr; '''[[batch system]]'''
-Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documention:
+== Example Jobs ==
-* Slurm systems
-**[[bwUniCluster_2.0_Slurm_common_Features|Slurm bwUniCluster 2.0]]
-** [[JUSTUS2/Slurm | Slurm JUSTUS 2]]
-** [[Helix/Slurm   | Slurm Helix]]
-* Moab systems (legacy systems with deprecated queuing system)
-** [[NEMO/Moab|Moab NEMO specific information]]
-** [[BinAC/Moab|Moab BinAC specific information]]
+For most software that a bwHPC project installed on the cluster, we have prepared an example job script running some example calculation with that exact software.
+How to access these examples is described in the "Software job examples" section of the page
-== Scaling ==
+&rarr; '''[[Environment Modules]]'''
-When you are running your calculations, you will have to decide on how many compute-cores your calculation will be simultaniously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead.
+== Job Script and Job Submission ==
-How to find a reasonable number of how many compute cores to use for your calculation is described in the page
-* [[Scaling]]
+Batch jobs are submitted with the command:
-== Energy Efficiency ==
+<syntaxhighlight lang="bash">$ sbatch <job-script> </syntaxhighlight>
-Please also see our advice for
-* [[Energy Efficient Cluster Usage]]
+A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:
-== HPC Glossary ==
+<syntaxhighlight lang="slurm">#!/bin/bash
-A short definition of the typical elements of an HPC cluster.
+#SBATCH --partition=<cluster specific>
+#SBATCH --ntasks-per-node=8
+#SBATCH --time=00:05:00
+#SBATCH --mem=1gb
+#SBATCH --export=NONE
+echo 'Here would start your calculation script'
+</syntaxhighlight>
+Please see cluster specific examples from the links in the next section, because exact options needed differ by cluster.
-;HPC
-: short for '''H'''igh '''P'''erformance '''C'''omputing
+== Link to Batch System and Examples per Cluster ==
-;HPC Cluster
-:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.
+Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documentation:
-;Node
-:An individual computer with one or more sockets, part of an HPC cluster.
+&rarr; '''[[BwUniCluster3.0/Running_Jobs|Slurm bwUniCluster 3.0]]'''
-;Socket
-:Physical socket where the CPU capsules are placed.
+&rarr; '''[[JUSTUS2/Jobscripts: Running Your Calculations | Slurm JUSTUS 2]]'''
-;Core
-:The physical unit that can independently execute tasks on a CPU. Modern CPUs generally have multiple cores.
+&rarr;  '''[[Helix/Slurm   | Slurm Helix]]'''
-;Thread
-:Logical unit that can be executed independently.
+&rarr;  '''[[NEMO2/Slurm | Slurm NEMO2]]'''
-;Hyperthreading
-: Modern computers can be configured so that one real compute-[[core]] appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.
+&rarr;  '''[[BinAC2/Slurm | Slurm BinAC2]]'''
-;Multithreading
-: Multithreading means that one computer program runs calculations on more than one compute-core using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of  multithreaded parallelization are [[OpenMP]] or [[MPI]].
+== How To Run Jobs Efficiently ==
-;CPU
-:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.
-;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.
+When you are running your calculations, you will have to decide on how many compute-cores your job will be simultaneously calculated.
-;RAM
+For this, your computational problem will have to be divided into pieces, which always causes some overhead.
-:Random Access Memory. It is used as the working memory for the cores.
+How to find a reasonable number of how many compute cores to use for your calculation can be found under
+&rarr;  '''[[Scaling]]'''
-;Batch System
+Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for
-; Moab
+&rarr; '''[[Efficient Cluster Usage]]'''
-; Script
+To optimize the job efficiency, it is very helpful to check which resources a job has actually used.
-; Slurm
+&rarr; '''[[Job Monitoring]]'''
-; Shell Script / Bash
-; Job
-; Runtime
-; Scaling
-; Scheduler
-; Submit
-; Parallelization

Running Calculations: Difference between revisions

Latest revision as of 10:32, 14 July 2026

Contents

Life Cycle of a Calculation (Job)

Example Jobs

Job Script and Job Submission

Link to Batch System and Examples per Cluster

How To Run Jobs Efficiently

Navigation menu

Running Calculations: Difference between revisions

Latest revision as of 10:32, 14 July 2026

Life Cycle of a Calculation (Job)

Example Jobs

Job Script and Job Submission

Link to Batch System and Examples per Cluster

How To Run Jobs Efficiently

Navigation menu

Search