Helix/Hardware: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
Line 41: Line 41:
|-
|-
!scope="column" | Installed Working Memory (GB)
!scope="column" | Installed Working Memory (GB)
| 256
| 2048
| 2048
| 256
| 256
Line 48: Line 47:
|-
|-
!scope="column"; style="background-color:#A2C4EB" | Available Memory for Jobs (GB)
!scope="column"; style="background-color:#A2C4EB" | Available Memory for Jobs (GB)
!style="text-align:left;"| 236
| 236
!style="text-align:left;"| 2010
| 2010
!style="text-align:left;"| 236
| 236
!style="text-align:left;"| 236
| 236
!style="text-align:left;"| 2010
| 2010
|-
|-
!scope="column" | Interconnect
!scope="column" | Interconnect
Line 69: Line 68:
|-
|-
!scope="column"; style="background-color:#A2C4EB" | Number of GPUs
!scope="column"; style="background-color:#A2C4EB" | Number of GPUs
!style="text-align:left;"| -
| -
!style="text-align:left;"| -
| -
!style="text-align:left;"| 4
| 4
!style="text-align:left;"| 4
| 4
!style="text-align:left;"| 8
| 8
|-
|-
!scope="column"; style="background-color:#A2C4EB" | GPU Type
!scope="column"; style="background-color:#A2C4EB" | GPU Type
!style="text-align:left;"| -
| -
!style="text-align:left;"| -
| -
!style="text-align:left;"| A40
| A40
!style="text-align:left;"| A100
| A100
!style="text-align:left;"| A100
| A100
|}
|}



Revision as of 09:59, 9 August 2024

System Architecture

The bwForCluster Helix is a high performance supercomputer with high speed interconnect. The system consists of compute nodes (CPU and GPU nodes), some infrastructure nodes for login and administration and a storage system. All components are connected via a fast Infiniband network. The login nodes are also connected to the Internet via Baden Württemberg's extended LAN BelWü.

Helix-skizze.png

Operating System and Software

  • Operating system: RedHat
  • Queuing system: Slurm
  • Access to application software: Environment Modules

Compute Nodes

AMD Nodes

Common features of all AMD nodes:

  • Processors: 2 x AMD Milan EPYC 7513
  • Processor Frequency: 2.6 GHz
  • Number of Cores per Node: 64
  • Local disk: None
CPU Nodes GPU Nodes
Node Type cpu fat gpu4 gpu8
Quantity 355 15 29 26 3
Installed Working Memory (GB) 2048 256 256 2048
Available Memory for Jobs (GB) 236 2010 236 236 2010
Interconnect 1x HDR100 1x HDR100 2x HDR100 2x HDR200 4x HDR200
Coprocessors - - 4x Nvidia A40 (48 GB) 4x Nvidia A100 (40 GB) 8x Nvidia A100 (80 GB)
Number of GPUs - - 4 4 8
GPU Type - - A40 A100 A100

Intel Nodes

Some Intel nodes (Skylake and Cascade Lake) from the predecessor system will be integrated. Details will follow.


Storage Architecture

There is one storage system providing a large parallel file system based on IBM Spectrum Scale for $HOME, for workspaces, and for temporary job data.

Network

The components of the cluster are connected via two independent networks, a management network (Ethernet and IPMI) and an Infiniband fabric for MPI communication and storage access. The Infiniband backbone is a fully non-blocking fabric with 200 Gb/s data speed. The compute nodes are connected with different data speeds according to the node configuration.