Running Calculations

← This page is used in the HPC Glossary to explain the term "Batch Scheduler" and "Batch System"

Life Cycle of a Calculation (Job)

On your desktop computer you start your calculations and they start immediately, running until they are finished. Then your desktop does mostly nothing, until you start another calculation. A compute cluster has several hundred, maybe a thousand computers (compute nodes), all of them are busy most of the time and many people want to run a great number of calculations. So running your job has to include some extra steps:

prepare a script (a set commands to run - usually as a shell script), with all the commands that are necessary to run your calculation from start to finish. In addition to the commands necessary to run the calculation, this batch script has a header section, in which you specify details like required compute cores (processing units witin a computer), estimated runtime, memory requirements, disk space needed, etc.
Submit the script into a queue, where your job (calculation)
gets asigned an inital priority, is queued and waits in row with other compute jobs until the resources you requested in the header become available. (Requested time is also a resource!)
Execution: Once a suitable resource slot is available and your job is in the front of the queue of suitable jobs (suitable with regards to the resource slot), your script is executed on (a) compute node(s). Your calculation runs on that/those node(s) until it is finished or reaches the specified time limit.
Save results: Include commands to save the calculation results back to a long term storage (e.g. your home directory), at least at the end of your script. What you have not saved until the job finishes won't be saved!
If your job reaches the specified time limit, all your running processes will be killed and the resources get cleared. So any data that has not been saved will be lost!

The software that distributes jobs on compute nodes is called a batch system or batch scheduler. The software currently used as a batch system on bwHPC clusters is "Slurm".

Learn more about the functioning of job distribution in

→ batch system

Example Jobs

For most software that a bwHPC project installed on the cluster, we have prepared an example job script running some example calculation with that exact software.

How to access these examples is described in the "Software job examples" section of the page

→ Environment Modules

Job Script and Job Submission

Batch jobs are submitted with the command:

$ sbatch <job-script>

A job script contains options for Slurm in lines beginning with #SBATCH as well as your commands which you want to execute on the compute nodes. For example:

#!/bin/bash
#SBATCH --partition=<cluster specific>
#SBATCH --ntasks-per-node=8 
#SBATCH --time=00:05:00
#SBATCH --mem=1gb
#SBATCH --export=NONE
echo 'Here would start your calculation script'

Please see cluster specific examples from the links in the next section, because exact options needed differ by cluster.

Link to Batch System and Examples per Cluster

Because of differences in configuration (partly due to different available hardware), each cluster has their own batch system documentation:

→ Slurm bwUniCluster 3.0

→ Slurm JUSTUS 2

→ Slurm Helix

→ Slurm NEMO2

→ Slurm BinAC2

How To Run Jobs Efficiently

When you are running your calculations, you will have to decide on how many compute-cores your job will be simultaneously calculated. For this, your computational problem will have to be divided into pieces, which always causes some overhead.

How to find a reasonable number of how many compute cores to use for your calculation can be found under

→ Scaling

Running calculations on an HPC node consumes a lot of energy. To make the most of the available resources and keep cluster and energy use as efficient as possible please also see our advice for

→ Efficient Cluster Usage

To optimize the job efficiency, it is very helpful to check which resources a job has actually used.

→ Job Monitoring

Running Calculations

Contents

Life Cycle of a Calculation (Job)

Example Jobs

Job Script and Job Submission

Link to Batch System and Examples per Cluster

How To Run Jobs Efficiently

Navigation menu

Running Calculations

Life Cycle of a Calculation (Job)

Example Jobs

Job Script and Job Submission

Link to Batch System and Examples per Cluster

How To Run Jobs Efficiently

Navigation menu

Search