HPC Glossary: Difference between revisions

Latest revision as of 11:35, 15 July 2025

A short definition of the typical elements of an HPC cluster.

‎Batch Scheduler / Batch system: The software that distributes the compute jobs of the users on the available resources (compute nodes).

‎Batch Script: A script that contains information in the form of special comments at the beginning of the script which contain information about how many compute resources of what kind are needed.

‎Core: The physical unit that can independently execute the instructions of a program on a CPU. Modern CPUs generally have multiple cores.

‎Compute Job: A calculation you want to run on one of the compute nodes and for which you have written a batch script and which will automatically start on one of the compute nodes after submitting the job

‎CPU: Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache. The term is sometimes also used to describe a single core. To distinguish, one now often uses the term socket for the whole cpu.

‎CPU Time: The time that CPUs have spent to calculate something. If 10 CPU cores calculate something for 1 hour each (even if it happens within the same hour), then 10 CPU-hours have been used for this calculation.

‎GPU: Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.

‎HPC: Short for High Performance Computing

‎HPC Cluster: Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.

‎Hyperthreading: Modern computers can be configured so that one real compute-core appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.

Infiniband: Infiniband is a high-speed network often used to connect nodes on a HPC cluster. Omni-Path is the name of the same technology by different vendor.

Job: See Compute Job

Job System: See Batch Scheduler

‎Moab: A batch system software

‎MPI: Standard for the Message Passing Interface, mainly for distributed memory machines (like HPC Clusters with many compute nodes, each a shared memory system with many cores) scaling from one node to thousands of compute nodes.

‎Multithreading: Multithreading means that one computer program runs calculations on more than one compute-core using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of multithreaded parallelization are OpenMP or MPI.

‎Node: An individual computer with one or more sockets, part of an HPC cluster.

Omni-Path: Omni-Path is a high-speed network often used to connect nodes on a HPC cluster. It is basically Infiniband by a different vendor.

‎OpenMP: Specification for Shared Memory parallelization based on Threads for many-core CPUs and GPUs.

‎Parallelization: Enabling programs to calculate parts of the problem in parallel, reducing overall wallclock time (albeit with some parallelization overhead like runtime environment, communication and synchronization).

‎RAM: Random Access Memory. It is used as the working memory for the cores.

‎Runtime: The time a calculation needs to run (see CPU Time and Wall Clock Time).

‎Scaling: dividing a problem in several sub-problems creates additional work for taking track of the sub-problems and assembling the pieces to solve the whole problem. At some point this additional work becomes larger than the work spent on calculating the actual problem. A problem is called to "scale well", if little such additional work is needed.

‎Script: A set of instructions that the computer runs one after another, but that is not compiled into computer-instructions like a program.

Simultaneous Multithreading: See Hyperthreading

SMT: See Hyperthreading

‎SLURM: A batch system software

‎Socket: Physical socket where the CPU chips are placed. Often used as a synonym to CPU if a computer has more than one socket and one wants to make clear that only one of the CPU chips sitting in one socket is meant.

‎Submit: Send a compute job into the queue to wait until it can run on a compute node

‎Thread: Logical unit that can be executed independently within a process, sharing resources such as allocated memory, open files and signal handlers.

‎Wall Clock Time

The term "Wall Clock Time" is used to distinguish it from CPU time (see Runtime).

@@ Line 1: / Line 1: @@
 A short definition of the typical elements of an HPC cluster.
-;Batch System // Job Scheduler // Batch Scheduler
+;<span id="Batch_Scheduler">‎</span>Batch Scheduler / [[Batch system]]
-: The software that distributes the compute [[Jobs]] of the users on the available resources (compute nodes).
+: The software that distributes the [[#Compute_Job|compute jobs]] of the users on the available resources (compute nodes).
+;<span id="Batch_Script">‎</span>Batch Script
-;Core
+: A [[#Script|script]] that contains information in the form of special comments at the beginning of the script which contain information about how many compute resources of what kind are needed.
-:The physical unit that can independently execute the instructions of a program on a CPU. Modern CPUs generally have multiple cores.
+;<span id="Core">‎</span>Core
-;CPU
+: The physical unit that can independently execute the instructions of a program on a CPU. Modern CPUs generally have multiple cores.
-:Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous cores and layers of cache.
+;<span id="Compute_Job">‎</span>Compute Job
-;GPU:Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.
+: A calculation you want to run on one of the compute nodes and for which you have written a [[#Batch_Script|batch script]] and which will automatically start on one of the compute nodes after [[#Submit|submit]]ting the job
+;<span id="CPU">‎</span>CPU
-;HPC
+: Central Processing Unit. It performs the actual computation in a compute node. A modern CPU is composed of numerous [[#Core|cores]] and layers of cache. The term is sometimes also used to describe a single core. To distinguish, one now often uses the term [[#socket|socket]] for the whole cpu.
-: short for '''H'''igh '''P'''erformance '''C'''omputing
+;<span id="CPU_Time">‎</span>CPU Time
-;HPC Cluster
+: The time that CPUs have spent to calculate something. If 10 CPU [[#Core|cores]] calculate something for 1 hour each (even if it happens within the same hour), then 10 CPU-hours have been used for this calculation.
-:Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.
+;<span id="GPU">‎</span>GPU
-;Hyperthreading
+: Graphics Processing Unit. GPUs in HPC clusters are used as high-performance accelerators and are particularly useful to process workloads in Machine Learning (ML) and Artificial Intelligence (AI) more efficiently. The software has to be explicitly designed to use GPUs. CUDA and OpenACC are the most popular platforms in scientific computing with GPUs.
-: Modern computers can be configured so that one real compute-[[core]] appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.
+;<span id="HPC">‎</span>HPC
-; Job or Compute Job
+: Short for '''H'''igh '''P'''erformance '''C'''omputing
-: A calculation you want to run on one of the compute nodes and for which you have written a [[batch script]] and which will automaticall start on one of the compute nodes after [[submit]]ting the job
+;<span id="HPC Cluster">‎</span>HPC Cluster
-;Multithreading
+: Collection of compute nodes with (usually) high bandwidth and low latency communication. They can be accessed via login nodes.
-: Multithreading means that one computer program runs calculations on more than one compute-core using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of  multithreaded parallelization are [[OpenMP]] or [[MPI]].
+;<span id="Hyperthreading">‎</span>Hyperthreading
-;Node
+: Modern computers can be configured so that one real compute-[[#Core|core]] appears like two "logical" cores on the system. These two "hyperthreads" can sometimes do computations in parallel, if the calculations use two different sub-units of the compute-core - but most of the time, two calculations on two hyperthreads run on the same physical hardware and both run half as fast as if one thread had a full core. Some programs (e.g. gromacs) can profit from running with twice as many threads on hyperthreads and finish 10-20% faster if run in that way.
-:An individual computer with one or more sockets, part of an HPC cluster.
+;Infiniband
-;RAM
+: [[Infiniband]] is a high-speed network often used to connect nodes on a HPC cluster. Omni-Path is the name of the same technology by  different vendor.
-:Random Access Memory. It is used as the working memory for the cores.
+;Job
-;Socket
+: See [[#Compute_Job|Compute Job]]
-:Physical socket where the CPU capsules are placed. Often used as a synonym to CPU if a computer has more than one socket and one wants to make clear that only one of the CPU chips sitting in one socket is meant.
+;Job System
-;Thread
+: See [[#Batch_Scheduler|Batch Scheduler]]
-:Logical unit that can be executed independently.
+;<span id="Moab">‎</span>Moab
-; Script
+: A batch system software
-: A set of instructions that the computer runs one after another, but that is not compiled into computer-instructions like a program.
+;<span id="MPI">‎</span>MPI
-; Batch Script
+: Standard for the Message Passing Interface, mainly for distributed memory machines (like HPC Clusters with many compute nodes, each a shared memory system with many cores) scaling from one node to thousands of compute nodes.
-: A [[script]] that contains information in the form of special [[comment]]s at the beginning of the script which contain information about how many compute resources of what kind are needed.
+;<span id="Multithreading">‎</span>Multithreading
-; Slurm
+: Multithreading means that one computer program runs calculations on more than one compute-[[#Core|core]] using several logical "threads" of serial compute instructions to do so (eg. to work through different and independent data arrays in parallel). Specific types of  multithreaded parallelization are [[#OpenMP|OpenMP]] or [[#MPI|MPI]].
-: A Batch System // Job Scheduler software
+;<span id="Node">‎</span>Node
-; Moab
+: An individual computer with one or more sockets, part of an HPC cluster.
-: A Batch System // Job Scheduler software
+;Omni-Path
-; Runtime // Wall Clock Time
+: [[Omni-Path]] is a high-speed network often used to connect nodes on a HPC cluster. It is basically Infiniband by a different vendor.
-: The time a calculation needs to run. The term "Wall Clock Time" is used to distinguish it from [[CPU time]].
+;<span id="OpenMP">‎</span>OpenMP
-; CPU time
+: Specification for Shared Memory parallelization based on Threads for many-core CPUs and GPUs.
-: The time that CPUs have spent to calculate something. If 10 CPU cores calculate something for 1 hour each (even if it happens within the same hour), then 10 CPU-hours have been used for this calculation.
+;<span id="Parallelization">‎</span>Parallelization
-; Scaling
+: Enabling programs to calculate parts of the problem in parallel, reducing overall wallclock time (albeit with some parallelization overhead like runtime environment, communication and synchronization).
+;<span id="RAM">‎</span>RAM
+: Random Access Memory. It is used as the working memory for the [[#Core|cores]].
+;<span id="Runtime">‎</span>Runtime
+: The time a calculation needs to run (see [[#CPU_Time|CPU Time]] and [[#Wall_Clock_Time|Wall Clock Time]]).
+;<span id="Scaling">‎</span>Scaling
 : dividing a problem in several sub-problems creates additional work for taking track of the sub-problems and assembling the pieces to solve the whole problem. At some point this additional work becomes larger than the work spent on calculating the actual problem. A problem is called to "scale well", if little such additional work is needed.
+;<span id="Script">‎</span>Script
-; Submit
+: A set of instructions that the computer runs one after another, but that is not compiled into computer-instructions like a program.
-: send a compute job into the queue to wait until it can run on a compute node
+;Simultaneous Multithreading
+: See [[#Hyperthreading|Hyperthreading]]
+;SMT
+: See [[#Hyperthreading|Hyperthreading]]
+;<span id="SLURM">‎</span>SLURM
+: A batch system software
+;<span id="Socket">‎</span>Socket
+: Physical socket where the CPU chips are placed. Often used as a synonym to CPU if a computer has more than one socket and one wants to make clear that only one of the CPU chips sitting in one socket is meant.
+;<span id="Submit">‎</span>Submit
+: Send a compute job into the queue to wait until it can run on a compute node
+;<span id="Thread">‎</span>Thread
+: Logical unit that can be executed independently within a process, sharing resources such as allocated memory, open files and signal handlers.
+;<span id="Wall_Clock_Time">‎</span>Wall Clock Time
-; Parallelization
+The term "Wall Clock Time" is used to distinguish it from [[#CPU_Time|CPU time]] (see [[#Runtime|Runtime]]).
-: Making it possible for programs to calculate parts of the problem they want to solve in parallel.

HPC Glossary: Difference between revisions

Latest revision as of 11:35, 15 July 2025

Navigation menu

Search