Hardware and Architecture (bwForCluster Chemistry)
The bwForCluster for computational and theoretical Chemistry Justus is not in live operation yet. The cluster is scheduled to enter live operation on December 5th, 2014.
1 System Architecture
The bwForCluster for computational and theoretical Chemistry Justus is a high-performance compute resource with high speed interconnect. It is intended for chemistry-related jobs with high memory (RAM,disk) needs and medium to low requirements to the node-interconnecting Infiniband network.
1.1 Basic Software Features
- Red Hat Enterprise Linux (RHEL) 7
- Queuing System: MOAB/Torque
- Environment Modules system to access software
1.2 Common Hardware Features
A total of 444 compute nodes plus 10 nodes for login, admin and visualization purposes.
- Processor: 2x Intel Xeon E5-2630v3 Prozessor (Haswell, 8-core, 2.4 GHz)
- Two processors per node (2x8 cores)
- 1x QDR InfiniBand HCA, single Port, Intel TrueScale
1.3 Node Types
There are three types of compute nodes, matched for increasingly less scalable and more memory-intensive (RAM and disks) jobs.
|Diskless nodes||SSD nodes||Large Memory nodes|
|Disks||-||4x 240 GB Enterprise SSD||4x 480 GB Enterprise SSD|
|RAID||-||RAID 0||RAID 0|
2 Storage Architecture
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. The files in $HOME are stored on a zfs filesystem and provided via NFS to all nodes.
Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $TMPDIR directory for I/O-heavy usecases and workspaces for less I/O intense multinode-jobs.
2.2 Lustre filesystem "/work"
Workspaces tools can be used to get temporary space on the lustre file system.
The variable $TMPDIR will point to a filesystem on the SSDs for any node with such disks.
Diskless clients will provide a directory /tmp that points into a ram-disk which can be extended in size up to 50% of the RAM capacity of the machine.
2.4 Central Storage via Blockdevices
It will be possible to expand the diskspace of the local SSDs using part of the capacity of a central disk repository that is exported in the form of block devices. This service will be made available only after the cluster has gone in official live-operation.
The compute nodes are interconnected with QDR Infiniband for the communication needs of jobs and with gigabit ethernet for login and similar traffic.
The Infiniband network uses a blocking factor which means that islands of 32 nodes are fully interconnected. The hostnames of the node reflect this structure, e.g. node n0908 is the eighth node on the ninth Infiniband island, which it shares with nodes n0901 to n0932.