Helix/Hardware: Difference between revisions
S Richling (talk | contribs) |
S Richling (talk | contribs) No edit summary |
||
(38 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
== System Architecture == |
== System Architecture == |
||
The bwForCluster Helix is a high performance supercomputer with high speed interconnect. |
The bwForCluster Helix is a high performance supercomputer with high speed interconnect. The system consists of compute nodes (CPU and GPU nodes), some infrastructure nodes for login and administration and a storage system. All components are connected via a fast Infiniband network. The login nodes are also connected to the Internet via Baden Württemberg's extended LAN BelWü. |
||
[[File:Helix-skizze.png|400px]] |
|||
== Operating System and Software == |
== Operating System and Software == |
||
Line 19: | Line 21: | ||
* Local disk: None |
* Local disk: None |
||
{| class="wikitable" |
{| class="wikitable" style="width:70%;" |
||
|- |
|- |
||
! style="width:20%"| |
! style="width:20%" | |
||
! style="width: |
! style="width:40%" colspan="2" style="text-align:center" | CPU Nodes |
||
! style="width: |
! style="width:40%" colspan="3" style="text-align:center" | GPU Nodes |
||
|- |
|- |
||
!scope="column"| Node Type |
!scope="column"| Node Type |
||
Line 32: | Line 34: | ||
|- |
|- |
||
!scope="column"| Quantity |
!scope="column"| Quantity |
||
| |
| 355 |
||
| |
| 15 |
||
| 29 |
| 29 |
||
| 26 |
| 26 |
||
| |
| 3 |
||
|- |
|- |
||
!scope="column" | Installed Working Memory (GB) |
!scope="column" | Installed Working Memory (GB) |
||
Line 46: | Line 48: | ||
|- |
|- |
||
!scope="column" | Available Memory for Jobs (GB) |
!scope="column" | Available Memory for Jobs (GB) |
||
| |
| 236 |
||
| 2010 |
| 2010 |
||
| |
| 236 |
||
| |
| 236 |
||
| 2010 |
| 2010 |
||
|- |
|- |
||
Line 84: | Line 86: | ||
Some Intel nodes (Skylake and Cascade Lake) from the predecessor system will be integrated. Details will follow. |
Some Intel nodes (Skylake and Cascade Lake) from the predecessor system will be integrated. Details will follow. |
||
<!-- Intel nodes tabel (draft) |
|||
Common features of all Intel nodes: |
|||
* Interconnect: 1x EDR |
|||
{| class="wikitable" |
|||
|- |
|||
! style="width:12%"| |
|||
! style="width:10%" colspan="2" style="text-align:center" |CPU |
|||
! style="width:20%" colspan="8" style="text-align:center" |GPU |
|||
|- |
|||
!scope="column"| Node Type |
|||
| colspan="1" style="text-align:center" | cpu-sky |
|||
| colspan="1" style="text-align:center" | cpu-cas |
|||
| colspan="4" style="text-align:center" | gpu-sky |
|||
| colspan="4" style="text-align:center" | gpu-cas |
|||
|- |
|||
!scope="column"| Architecture |
|||
| colspan="1" style="text-align:center" | Skylake |
|||
| colspan="1" style="text-align:center" | Cascade Lake |
|||
| colspan="4" style="text-align:center" | Skylake |
|||
| colspan="4" style="text-align:center" | Cascade Lake |
|||
|- |
|||
!scope="column"| Quantity |
|||
| 24 |
|||
| 5 |
|||
| 1 |
|||
| 1 |
|||
| 2 |
|||
| 3 |
|||
| 3 |
|||
| 3 |
|||
| 3 |
|||
| 1 |
|||
|- |
|||
!scope="column" | Processors |
|||
| 2 x Intel Xeon Gold 6130 |
|||
| 2 x Intel Xeon Gold 6230 |
|||
| 2 x Intel Xeon Gold 6130 |
|||
| 2 x Intel Xeon Gold 6130 |
|||
| 2 x Intel Xeon Gold 6130 |
|||
| 2 x Intel Xeon Gold 6130 |
|||
| 2 x Intel Xeon Gold 6230 |
|||
| 2 x Intel Xeon Gold 6230 |
|||
| 2 x Intel Xeon Gold 6240R |
|||
| 2 x Intel Xeon Gold 6240R |
|||
|- |
|||
!scope="column" | Processor Frequency (GHz) |
|||
| 2.1 |
|||
| 2.2 |
|||
| 2.1 |
|||
| 2.1 |
|||
| 2.1 |
|||
| 2.1 |
|||
| 2.1 |
|||
| 2.1 |
|||
| 2.4 |
|||
| 2.4 |
|||
|- |
|||
!scope="column" | Number of Cores |
|||
| 32 |
|||
| 40 |
|||
| 32 |
|||
| 32 |
|||
| 32 |
|||
| 32 |
|||
| 40 |
|||
| 40 |
|||
| 48 |
|||
| 48 |
|||
|- |
|||
!scope="column" | Working Memory (GB) |
|||
| 192 |
|||
| 384 |
|||
| 192 |
|||
| 384 |
|||
| 384 |
|||
| 384 |
|||
| 384 |
|||
| 384 |
|||
| 384 |
|||
| 384 |
|||
|- |
|||
!scope="column" | Local Disk (GB) |
|||
| 512 (SSD) |
|||
| 480 (SSD) |
|||
| 512 (SSD) |
|||
| 512 (SSD) |
|||
| 512 (SSD) |
|||
| 512 (SSD) |
|||
| 480 (SSD) |
|||
| 480 (SSD) |
|||
| 480 (SSD) |
|||
| 480 (SSD) |
|||
|- |
|||
!scope="column" | Coprocessors |
|||
| - |
|||
| - |
|||
| 4 x [https://www.nvidia.com/en-us/titan/titan-xp/ Nvidia Titan Xp (12 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/data-center/tesla-v100/ Nvidia Tesla V100 (16 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/geforce/products/10series/geforce-gtx-1080-ti/ Nvidia GeForce GTX 1080Ti (11 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/geforce/graphics-cards/rtx-2080-ti/ Nvidia GeForce RTX 2080Ti (11 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/data-center/tesla-v100/ Nvidia Tesla V100 (16 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/data-center/tesla-v100/ Nvidia Tesla V100s (32 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/geforce/graphics-cards/30-series/rtx-3090 Nvidia GeForce RTX 3090 (24 GB)] |
|||
| 4 x [https://www.nvidia.com/de-de/design-visualization/quadro/rtx-8000/ Nvidia Quadro RTX 8000 (48 GB)] |
|||
|- |
|||
! scope="column" | Number of GPUs |
|||
| - |
|||
| - |
|||
| 4 |
|||
| 4 |
|||
| 4 |
|||
| 4 |
|||
| 4 |
|||
| 4 |
|||
| 4 |
|||
| 4 |
|||
|- |
|||
! scope="column" | GPU Type |
|||
| - |
|||
| - |
|||
| TITAN |
|||
| V100 |
|||
| GTX1080 |
|||
| RTX2080 |
|||
| V100 |
|||
| V100S |
|||
| RTX3090 |
|||
| RTX8000 |
|||
|} |
|||
--> |
|||
== Storage Architecture == |
== Storage Architecture == |
Revision as of 16:23, 17 May 2024
System Architecture
The bwForCluster Helix is a high performance supercomputer with high speed interconnect. The system consists of compute nodes (CPU and GPU nodes), some infrastructure nodes for login and administration and a storage system. All components are connected via a fast Infiniband network. The login nodes are also connected to the Internet via Baden Württemberg's extended LAN BelWü.
Operating System and Software
- Operating system: RedHat
- Queuing system: Slurm
- Access to application software: Environment Modules
Compute Nodes
AMD Nodes
Common features of all AMD nodes:
- Processors: 2 x AMD Milan EPYC 7513
- Processor Frequency: 2.6 GHz
- Number of Cores per Node: 64
- Local disk: None
CPU Nodes | GPU Nodes | ||||
---|---|---|---|---|---|
Node Type | cpu | fat | gpu4 | gpu8 | |
Quantity | 355 | 15 | 29 | 26 | 3 |
Installed Working Memory (GB) | 256 | 2048 | 256 | 256 | 2048 |
Available Memory for Jobs (GB) | 236 | 2010 | 236 | 236 | 2010 |
Interconnect | 1x HDR100 | 1x HDR100 | 2x HDR100 | 2x HDR200 | 4x HDR200 |
Coprocessors | - | - | 4x Nvidia A40 (48 GB) | 4x Nvidia A100 (40 GB) | 8x Nvidia A100 (80 GB) |
Number of GPUs | - | - | 4 | 4 | 8 |
GPU Type | - | - | A40 | A100 | A100 |
Intel Nodes
Some Intel nodes (Skylake and Cascade Lake) from the predecessor system will be integrated. Details will follow.
Storage Architecture
There is one storage system providing a large parallel file system based on IBM Spectrum Scale for $HOME, for workspaces, and for temporary job data.
Network
The components of the cluster are connected via two independent networks, a management network (Ethernet and IPMI) and an Infiniband fabric for MPI communication and storage access. The Infiniband backbone is a fully non-blocking fabric with 200 Gb/s data speed. The compute nodes are connected with different data speeds according to the node configuration.