Batch system: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
(Created page with "When we speak of a '''batch system''' on compute clusters, we mean the system that knows which compute nodes are used by whom and when they will become available. It also know...")
 
(Aus Kurs übertragen)
Line 1: Line 1:
When we speak of a '''batch system''' on compute clusters, we mean the system that knows which compute nodes are used by whom and when they will become available. It also knows about all waiting jobs and determines which job are going to start next on which node whenever a node bekomes available.
When we speak of a '''batch system''' on compute clusters, we mean the system that knows which compute nodes are used by whom and when they will become available. It also knows about all waiting jobs and determines which job are going to start next on which node whenever a node bekomes available.

== Why do we need a Resource Management System? ==

An HPC cluster is a multi-user system. Users have compute jobs with different demands on number of processor cores, memory, disk space and run-time. Some users run a program only occasionally for a big task, other users must run many simulations to finish their projects.

The cluster only provides a limited number of compute resources with certain features. Free access for all users to all compute nodes without time limit will not work. Therefore we need a resource management system (batch system) for the scheduling and the distribution of compute jobs on suitable compute resources.
The use of a resource management system pursues several objectives:

* Fair distribution of resources among users
* Compute jobs should start as soon as possible
* Full load and efficient usage of all resources

Revision as of 10:48, 1 July 2025

When we speak of a batch system on compute clusters, we mean the system that knows which compute nodes are used by whom and when they will become available. It also knows about all waiting jobs and determines which job are going to start next on which node whenever a node bekomes available.

Why do we need a Resource Management System?

An HPC cluster is a multi-user system. Users have compute jobs with different demands on number of processor cores, memory, disk space and run-time. Some users run a program only occasionally for a big task, other users must run many simulations to finish their projects.

The cluster only provides a limited number of compute resources with certain features. Free access for all users to all compute nodes without time limit will not work. Therefore we need a resource management system (batch system) for the scheduling and the distribution of compute jobs on suitable compute resources. The use of a resource management system pursues several objectives:

  • Fair distribution of resources among users
  • Compute jobs should start as soon as possible
  • Full load and efficient usage of all resources