Hardware and Architecture (bwForCluster Chemistry)

From bwHPC Wiki
Revision as of 23:32, 9 February 2015 by J Salk (talk | contribs) ($HOME)
Jump to: navigation, search

1 System Architecture

The bwForCluster for computational and theoretical Chemistry Justus is a high-performance compute resource with high speed interconnect. It is intended for chemistry-related jobs with high memory (RAM,disk) needs and medium to low requirements to the node-interconnecting Infiniband network.

Overview on bwForCluster Chemistry showing only the connecting the Infiniband network. All machines are additionally connected by 1GB Ethernet.


1.1 Basic Software Features

  • Red Hat Enterprise Linux (RHEL) 7
  • Queuing System: MOAB/Torque
  • Environment Modules system to access software

1.2 Common Hardware Features

A total of 444 compute nodes plus 10 nodes for login, admin and visualization purposes.

  • Processor: 2x Intel Xeon E5-2630v3 Prozessor (Haswell, 8-core, 2.4 GHz)
  • Two processors per node (2x8 cores)
  • 1x QDR InfiniBand HCA, single Port, Intel TrueScale

1.3 Node Types

There are three types of compute nodes, matched for increasingly less scalable and more memory-intensive (RAM and disks) jobs.

Diskless nodes SSD nodes Large Memory nodes
Quantity 224 204 16
RAM (GB) 128 128 512
Disk Space - ~1TB ~2TB
Disks - 4x 240 GB Enterprise SSD 4x 480 GB Enterprise SSD
RAID - RAID 0 RAID 0


2 Storage Architecture

Overview of the bwForCluster Chemistry storage concept.

The storage concept of the bwForCluster for Chemistry Disks are served by two redundant servers for the Lustre and ZFS work and home directories. The additional block device storage is meant to expand the space of the local SSDs for problems that cannot fit in this local space.

$TMPDIR central block storage workspaces $HOME
Visibility local on-demand local global global
Lifetime batch job walltime batch job walltime < 90 days permanent
Disk space diskless/1TB/2TB 480 TB 200 TB 200 TB
Quotas no no no 100 GB
Backup no no no yes
 global       :  all nodes access the same file system;
 local         :  each node has its own file system;
 permanent:  files are stored permanently;
 batch job walltime :  files are removed at end of the batch job.


2.1 $HOME

Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. The files in $HOME are stored on a zfs filesystem and provided via NFS to all nodes.

Current disk usage on home directory and quota status can be checked with the diskusage command:

$ diskusage

User           	   Used (GB)	  Quota (GB)	Used (%)
------------------------------------------------------------------------
<username>                4.38               100.00             4.38

Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $TMPDIR directory for I/O-heavy usecases and workspaces for less I/O intense multinode-jobs.

2.2 Lustre filesystem "/work"

Workspaces tools can be used to get temporary space on the lustre file system.

To create a workspace you need to supply a name for your workspace area and a lifetime in days. The maximum lifetime is 90 days.


  • allocate a workspace
$ ws_allocate myprojectworkspace 50
Workspace created. Duration is 1200 hours. 
Further extensions available: 9999
/lustre/lxfs/work/ws/username-myprojectworkspace-0

For more information is available with ws_allocate -h'

  • extend a workspace
$ ws_extend myprojectworkspace 50
Duration of workspace is successfully changed!
New duration is 1200 hours. Further extensions available: 9998

changes the lifetime of the workspace to the specific amount of days.

  • delete a workspace
$ ws_release myprojectworkspace
/lustre/lxfs/work/ws/user-myprojectworkspace-0
Info: Workspace was deleted.



2.3 $TMPDIR and $SCRATCH

The variables $TMPDIR and $SCRATCH will point to a filesystem on the SSDs for any node with such disks. If you want to use SSD scratch space in your compute job, you will have to request a disk space MOAB resource when submitting your job, as described in Batch Jobs - bwForCluster Chemistry Features#Disk Space.

Diskless clients will provide a directory /tmp that points into a ram-disk which can be extended in size up to 50% of the RAM capacity of the machine.

2.4 Central Storage via Blockdevices

It will be possible to expand the diskspace of the local SSDs using part of the capacity of a central disk repository that is exported in the form of block devices. This service will be made available at a later date after the cluster has gone in official live-operation.

3 Network

The compute nodes are interconnected with QDR Infiniband for the communication needs of jobs and with gigabit ethernet for login and similar traffic.

3.1 Infiniband

The Infiniband network uses a blocking factor which means that islands of 32 nodes are fully interconnected. The hostnames of the node reflect this structure, e.g. node n0908 is the eighth node on the ninth Infiniband island, which it shares with nodes n0901 to n0932.