HPC Glossary: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
(Created page with "A short definition of the typical elements of an HPC cluster. ;Batch System // Job Scheduler // Batch Scheduler : The software that distributes the compute Jobs of the u...")
 
mNo edit summary
Line 1: Line 1:
A short definition of the typical elements of an HPC cluster.
A short definition of the typical elements of an HPC cluster.


;Batch System // Job Scheduler // Batch Scheduler
; Batch Scheduler
: The software that distributes the compute [[Jobs]] of the users on the available resources (compute nodes).
: The software that distributes the compute [[Jobs]] of the users on the available resources (compute nodes).


; Batch Script
;Core
: A [[script]] that contains information in the form of special [[comment]]s at the beginning of the script which contain information about how many compute resources of what kind are needed.
:The physical unit that can independently execute the instructions of a program on a CPU. Modern CPUs generally have multiple cores.


; Batch System Batch Scheduler
;CPU
: See
:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.


; Core
;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.
: The physical unit that can independently execute the instructions of a program on a CPU. Modern CPUs generally have multiple cores.


; Compute Job
;HPC
: A calculation you want to run on one of the compute nodes and for which you have written a [[batch script]] and which will automaticall start on one of the compute nodes after [[submit]]ting the job
: short for '''H'''igh '''P'''erformance '''C'''omputing


; CPU
;HPC Cluster
: Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.
:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.


; CPU time
;Hyperthreading
: The time that CPUs have spent to calculate something. If 10 CPU cores calculate something for 1 hour each (even if it happens within the same hour), then 10 CPU-hours have been used for this calculation.
: Modern computers can be configured so that one real compute-[[core]] appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.


; GPU
; Job or Compute Job
: Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.
: A calculation you want to run on one of the compute nodes and for which you have written a [[batch script]] and which will automaticall start on one of the compute nodes after [[submit]]ting the job


; HPC
;Multithreading
: Short for '''H'''igh '''P'''erformance '''C'''omputing
: Multithreading means that one computer program runs calculations on more than one compute-core using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of multithreaded parallelization are [[OpenMP]] or [[MPI]].


; HPC Cluster
;Node
: Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.
:An individual computer with one or more sockets, part of an HPC cluster.


; Hyperthreading
;RAM
: Modern computers can be configured so that one real compute-[[core]] appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.
:Random Access Memory. It is used as the working memory for the cores.


; Job
;Socket
: See Compute Job
:Physical socket where the CPU capsules are placed. Often used as a synonym to CPU if a computer has more than one socket and one wants to make clear that only one of the CPU chips sitting in one socket is meant.


; Job System
;Thread
: See Batch Scheduler
:Logical unit that can be executed independently.


; Script
; Moab
: A batch system software
: A set of instructions that the computer runs one after another, but that is not compiled into computer-instructions like a program.


; Multithreading
; Batch Script
: Multithreading means that one computer program runs calculations on more than one compute-core using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of multithreaded parallelization are [[OpenMP]] or [[MPI]].
: A [[script]] that contains information in the form of special [[comment]]s at the beginning of the script which contain information about how many compute resources of what kind are needed.


; Slurm
; Node
: An individual computer with one or more sockets, part of an HPC cluster.
: A Batch System // Job Scheduler software


; Parallelization
; Moab
: Making it possible for programs to calculate parts of the problem they want to solve in parallel.
: A Batch System // Job Scheduler software

; RAM
: Random Access Memory. It is used as the working memory for the cores.


; Runtime // Wall Clock Time
; Runtime // Wall Clock Time
: The time a calculation needs to run. The term "Wall Clock Time" is used to distinguish it from [[CPU time]].
: The time a calculation needs to run. The term "Wall Clock Time" is used to distinguish it from [[CPU time]].

; CPU time
: The time that CPUs have spent to calculate something. If 10 CPU cores calculate something for 1 hour each (even if it happens within the same hour), then 10 CPU-hours have been used for this calculation.


; Scaling
; Scaling
: dividing a problem in several sub-problems creates additional work for taking track of the sub-problems and assembling the pieces to solve the whole problem. At some point this additional work becomes larger than the work spent on calculating the actual problem. A problem is called to "scale well", if little such additional work is needed.
: dividing a problem in several sub-problems creates additional work for taking track of the sub-problems and assembling the pieces to solve the whole problem. At some point this additional work becomes larger than the work spent on calculating the actual problem. A problem is called to "scale well", if little such additional work is needed.

; Script
: A set of instructions that the computer runs one after another, but that is not compiled into computer-instructions like a program.

; Socket
: Physical socket where the CPU capsules are placed. Often used as a synonym to CPU if a computer has more than one socket and one wants to make clear that only one of the CPU chips sitting in one socket is meant.

; SLURM
: A batch system software


; Submit
; Submit
: send a compute job into the queue to wait until it can run on a compute node
: Send a compute job into the queue to wait until it can run on a compute node


; Thread
; Parallelization
: Logical unit that can be executed independently.
: Making it possible for programs to calculate parts of the problem they want to solve in parallel.

Revision as of 09:59, 10 October 2023

A short definition of the typical elements of an HPC cluster.

Batch Scheduler
The software that distributes the compute Jobs of the users on the available resources (compute nodes).
Batch Script
A script that contains information in the form of special comments at the beginning of the script which contain information about how many compute resources of what kind are needed.
Batch System Batch Scheduler
See
Core
The physical unit that can independently execute the instructions of a program on a CPU. Modern CPUs generally have multiple cores.
Compute Job
A calculation you want to run on one of the compute nodes and for which you have written a batch script and which will automaticall start on one of the compute nodes after submitting the job
CPU
Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.
CPU time
The time that CPUs have spent to calculate something. If 10 CPU cores calculate something for 1 hour each (even if it happens within the same hour), then 10 CPU-hours have been used for this calculation.
GPU
Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.
HPC
Short for High Performance Computing
HPC Cluster
Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.
Hyperthreading
Modern computers can be configured so that one real compute-core appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.
Job
See Compute Job
Job System
See Batch Scheduler
Moab
A batch system software
Multithreading
Multithreading means that one computer program runs calculations on more than one compute-core using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of multithreaded parallelization are OpenMP or MPI.
Node
An individual computer with one or more sockets, part of an HPC cluster.
Parallelization
Making it possible for programs to calculate parts of the problem they want to solve in parallel.
RAM
Random Access Memory. It is used as the working memory for the cores.
Runtime // Wall Clock Time
The time a calculation needs to run. The term "Wall Clock Time" is used to distinguish it from CPU time.
Scaling
dividing a problem in several sub-problems creates additional work for taking track of the sub-problems and assembling the pieces to solve the whole problem. At some point this additional work becomes larger than the work spent on calculating the actual problem. A problem is called to "scale well", if little such additional work is needed.
Script
A set of instructions that the computer runs one after another, but that is not compiled into computer-instructions like a program.
Socket
Physical socket where the CPU capsules are placed. Often used as a synonym to CPU if a computer has more than one socket and one wants to make clear that only one of the CPU chips sitting in one socket is meant.
SLURM
A batch system software
Submit
Send a compute job into the queue to wait until it can run on a compute node
Thread
Logical unit that can be executed independently.