Hardware and Architecture (bwForCluster Chemistry)
1 System Architecture
The bwForCluster for computational and theoretical Chemistry Justus is a high-performance compute resource with high speed interconnect. It is intended for chemistry-related jobs with high memory (RAM,disk) needs and medium to low requirements to the node-interconnecting Infiniband network.
1.1 Basic Software Features
- Red Hat Enterprise Linux (RHEL) 7
- Queuing System: MOAB/Torque
- Environment Modules system to access software
1.2 Common Hardware Features
A total of 444 compute nodes plus 10 nodes for login, admin and visualization purposes.
- Processor: 2x Intel Xeon E5-2630v3 Prozessor (Haswell, 8-core, 2.4 GHz)
- Two processors per node (2x8 cores)
- 1x QDR InfiniBand HCA, single Port, Intel TrueScale
1.3 Node Types
There are three types of compute nodes, matched for increasingly less scalable and more memory-intensive (RAM and disks) jobs.
|Diskless nodes||SSD nodes||Large Memory nodes|
|Disks||-||4x 240 GB Enterprise SSD||4x 480 GB Enterprise SSD|
|RAID||-||RAID 0||RAID 0|
2 Storage Architecture
The storage concept of the bwForCluster for Chemistry Disks are served by two redundant servers for the Lustre and ZFS work and home directories. The additional block device storage is meant to expand the space of the local SSDs for problems that cannot fit in this local space.
|$TMPDIR||central block storage||workspaces||$HOME|
|Lifetime||batch job walltime||batch job walltime||< 90 days||permanent|
|Disk space||diskless/1TB/2TB||480 TB||200 TB||200 TB|
global : all nodes access the same file system; local : each node has its own file system; permanent: files are stored permanently; batch job walltime : files are removed at end of the batch job.
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. The files in $HOME are stored on a zfs filesystem and provided via NFS to all nodes.
Current disk usage on home directory and quota status can be checked with the diskusage command:
$ diskusage User Used (GB) Quota (GB) Used (%) ------------------------------------------------------------------------ <username> 4.38 100.00 4.38
Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $TMPDIR directory for I/O-heavy usecases and workspaces for less I/O intense multinode-jobs.
2.2 Lustre filesystem "/work"
Workspaces tools can be used to get temporary space on the lustre file system.
To create a workspace you need to supply a name for your workspace area and a lifetime in days. The maximum lifetime is 90 days.
- allocate a workspace
$ ws_allocate myprojectworkspace 50 Workspace created. Duration is 1200 hours. Further extensions available: 9999 /lustre/lxfs/work/ws/username-myprojectworkspace-0
For more information is available with ws_allocate -h'
- extend a workspace
$ ws_extend myprojectworkspace 50 Duration of workspace is successfully changed! New duration is 1200 hours. Further extensions available: 9998
changes the lifetime of the workspace to the specific amount of days.
- delete a workspace
$ ws_release myprojectworkspace /lustre/lxfs/work/ws/user-myprojectworkspace-0 Info: Workspace was deleted.
2.3 $TMPDIR and $SCRATCH
The variables $TMPDIR and $SCRATCH will point to a filesystem on the SSDs for any node with such disks. If you want to use SSD scratch space in your compute job, you will have to request a disk space MOAB resource when submitting your job, as described in Batch Jobs - bwForCluster Chemistry Features#Disk Space.
Diskless clients will provide a directory /tmp that points into a ram-disk which can be extended in size up to 50% of the RAM capacity of the machine.
2.4 Central Storage via Blockdevices
It will be possible to expand the diskspace of the local SSDs using part of the capacity of a central disk repository that is exported in the form of block devices. This service will be made available at a later date after the cluster has gone in official live-operation.
The compute nodes are interconnected with QDR Infiniband for the communication needs of jobs and with gigabit ethernet for login and similar traffic.
The Infiniband network uses a blocking factor which means that islands of 32 nodes are fully interconnected. The hostnames of the node reflect this structure, e.g. node n0908 is the eighth node on the ninth Infiniband island, which it shares with nodes n0901 to n0932.