BwUniCluster2.0/Hardware and Architecture

From bwHPC Wiki
< BwUniCluster2.0
Revision as of 13:12, 18 February 2020 by S Raffeiner (talk | contribs) (Created page with "= Architecture of bwUniCluster 2.0 = The bwUniCluster 2.0 is a parallel computer with distributed memory. Each node of system consists of at least two Intel Xeon processor, l...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

1 Architecture of bwUniCluster 2.0

The bwUniCluster 2.0 is a parallel computer with distributed memory. Each node of system consists of at least two Intel Xeon processor, local memory, disks, network adapters and optionally accelerators (NVIDIA Tesla V100). All nodes are connected by a fast InfiniBand 4X FDR interconnect. In addition the file system Lustre, that is connected by coupling the InfiniBand of the file server with the InfiniBand switch of the compute cluster, is added to bwUniCluster (uc1) to provide a fast and scalable parallel file system.

The operating system on each node is Red Hat Enterprise Linux (RHEL) 7.x. A number of additional software packages like e.g. SLURM have been installed on top. Some of these components are of special interest to end users and are briefly discussed in this document. Others which are of greater importance to system administrators will not be covered by this document.

The individual nodes of the system may act in different roles. According to the services supplied by the nodes, they are separated into disjoint groups. From an end users point of view the different groups of nodes are login nodes, compute nodes, file server nodes and administrative server nodes.

Login Nodes

The login nodes are the only nodes that are directly accessible by end users. These nodes are used for interactive login, file management, program development and interactive pre- and postprocessing. Two nodes are dedicated to this service but they are all accessible via one address and a DNS round-robin alias distributes the login sessions to the different login nodes.

Compute Node

The majority of nodes are compute nodes which are managed by a batch system. Users submit their jobs to the SLURM batch system and a job is executed when the required resources become available (depending on its fair-share priority).

File Server Nodes

The hardware of the parallel file system Lustre incorporates some file server nodes; the file system Lustre is connected by coupling the InfiniBand of the file server with the independent InfiniBand switch of the compute cluster. In addition to shared file space there is also local storage on the disks of each node (for details see chapter "File Systems").

Administrative Server Nodes

Some other nodes are delivering additional services like resource management, external network connection, administration etc. These nodes can be accessed directly by system administrators only.

2 Components of bwUniCluster

Compute nodes "Thin" Compute nodes "HPC" Compute nodes "HPC Broadwell" Compute nodes "Fat" GPU x4 GPU x8
Number of nodes 100 360 352 6 14 24

3 File Systems

New Lustre parallel file systems with a total capacity of 5 PB and aggregate throughput of 72 GB/s were procured with bwUniCluster 2.0.