NEMO/Hardware: Difference between revisions
m (→Workspaces) |
m (renamed work space to workspace) |
||
Line 87: | Line 87: | ||
== Storage Architecture == |
== Storage Architecture == |
||
The bwForCluster [https://nemo.uni-freiburg.de NEMO] consists of two separate storage systems, one for the user's home directory <tt>$HOME</tt> and one serving as a |
The bwForCluster [https://nemo.uni-freiburg.de NEMO] consists of two separate storage systems, one for the user's home directory <tt>$HOME</tt> and one serving as a workspace. The home directory is limited in space and parallel access but offers snapshots of your files and Backup. The workspace is a parallel file system which offers fast and parallel file access and a bigger capacity than the home directory. This storage is based on [https://www.beegfs.com/ BeeGFS] and can be accessed parallel from many nodes. Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the <tt>$TMPDIR</tt> environment variable. |
||
{| class="wikitable" |
{| class="wikitable" |
||
Line 93: | Line 93: | ||
! style="width:10%"| |
! style="width:10%"| |
||
! style="width:10%"| <tt>$HOME</tt> |
! style="width:10%"| <tt>$HOME</tt> |
||
! style="width:10%"| |
! style="width:10%"| Workspace |
||
! style="width:10%"| <tt>$TMPDIR</tt> |
! style="width:10%"| <tt>$TMPDIR</tt> |
||
|- |
|- |
||
Line 103: | Line 103: | ||
!scope="column" | Lifetime |
!scope="column" | Lifetime |
||
| permanent |
| permanent |
||
| |
| workspace lifetime (max. 100 days, extension possible) |
||
| batch job walltime |
| batch job walltime |
||
|- |
|- |
||
Line 147: | Line 147: | ||
Compute jobs on nodes must not write temporary data to $HOME. |
Compute jobs on nodes must not write temporary data to $HOME. |
||
Instead they should use the local $TMPDIR directory for I/O-heavy use cases |
Instead they should use the local $TMPDIR directory for I/O-heavy use cases |
||
and |
and workspaces for less I/O intense multinode-jobs. |
||
Line 253: | Line 253: | ||
== High Performance Network == |
== High Performance Network == |
||
The compute nodes all are interconnected through the high performance network Omni-Path which offers a very small latency and 100 Gbit/s throughput. The parallel storage for the |
The compute nodes all are interconnected through the high performance network Omni-Path which offers a very small latency and 100 Gbit/s throughput. The parallel storage for the workspaces is attached via Omni-Path to all cluster nodes. For non-blocking communication 17 islands with 44 nodes and 880 cores each are available. The islands are connected with a blocking factor of 1:11 (or 400 Gbit/s for 44 nodes). |
||
[[Category:BwForCluster NEMO]] |
[[Category:BwForCluster NEMO]] |
Revision as of 18:25, 24 April 2018
System Architecture
The bwForCluster NEMO is a high-performance compute resource with high speed interconnect. It is intended for compute activities related to research in for researchers from the fields Neuroscience, Elementary Particle Physics and Microsystems Engineering (NEMO).
Figure: bwForCluster NEMO Schematic |
Operating System and Software
- Operating System: CentOS Linux 7 (similar to RHEL 7)
- Queuing System: MOAB / Torque (see Batch Jobs for help)
- (Scientific) Libraries and Software: Environment Modules
Compute Nodes
For researchers from the scientific fields Neuroscience, Elementary Particle Physics and Microsystems Engineering the bwFor Cluster NEMO offers 748 compute nodes plus several special purpose nodes for login, interactive jobs, etc.
Compute node specification:
Compute Nodes | Interactive Nodes | Memory Nodes | Special Purpose Nodes | |
---|---|---|---|---|
Quantity | 772 | 6 | 4 / 4 | 4 |
Processors | 2 x Intel Xeon E5-2630v4 (Broadwell) | 1 x Intel Xeon Phi 7210 Knights Landing (KNL) | ||
Processor Frequency (GHz) | 2,2 | 1,3 | ||
Number of Cores per Node | 20 | 64 | ||
Working Memory DDR4 (GB) | 128 | 256 / 512 | 16 GB MCDRAM + 96 GB DDR4 | |
Local Disk (GB) | 240 (SSD) | |||
Interconnect | Omni-Path 100 |
Special Purpose Nodes
Besides the classical compute node several nodes serve as login and preprocessing nodes, nodes for interactive jobs and nodes for creating virtual environments providing a virtual service environment.
Storage Architecture
The bwForCluster NEMO consists of two separate storage systems, one for the user's home directory $HOME and one serving as a workspace. The home directory is limited in space and parallel access but offers snapshots of your files and Backup. The workspace is a parallel file system which offers fast and parallel file access and a bigger capacity than the home directory. This storage is based on BeeGFS and can be accessed parallel from many nodes. Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable.
$HOME | Workspace | $TMPDIR | |
---|---|---|---|
Visibility | global (GbE) | global (Omni-Path) | node local |
Lifetime | permanent | workspace lifetime (max. 100 days, extension possible) | batch job walltime |
Capacity | 45 TB | 768 TB | 200 GB per node |
Quotas | 100 GB per user | 20 TB / 1 Million files per user | none |
Backup | snapshots + tape backup | no | no |
global : all nodes access the same file system local : each node has its own file system permanent : files are stored permanently batch job walltime : files are removed at end of the batch job
$HOME
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. The files in $HOME are stored on a Isilon OneFS and provided via NFS to all nodes.
NOTE: Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $TMPDIR directory for I/O-heavy use cases and workspaces for less I/O intense multinode-jobs.
Workspaces
Workspaces can be generated through the workspace tools. This will generate a directory on the parallel storage with a limited lifetime. When this lifetime is reached the workspace will be deleted automatically after a grace period. Workspaces can be extended to prevent deletion. You can create reminders and calendar entries to prevent accidental removal.
To create a workspace you'll need to supply a name for your workspace area and a lifetime in days. For more information read the corresponding help, e.g: ws_allocate -h.
Defaults and maximum values:
Default and maximum lifetime (days) | 100 |
Maximum extensions | 99 |
Examples:
Command | Action |
---|---|
ws_allocate my_workspace 100 | Allocate a workspace named "my_workspace" for 100 days. |
ws_list | List all your workspaces. |
ws_find my_workspace | Get absolute path of workspace "my_workspace". |
ws_extend my_workspace 100 | Set expiration date of workspace "my_workspace" to 100 days (regardless of remaining days). |
ws_release my_workspace | Manually erase your workspace "my_workspace" and release used space on storage (remove data first for immediate deletion of the data). |
Sharing Workspace Data within your Workgroup
Data in workspaces can be shared with colleagues. Making workspaces world readable/writable using standard unix access rights is strongly discouraged. It is recommended to use ACL (Access Control Lists).
Best practices with respect to ACL usage:
- Take into account that ACL take precedence over standard unix access rights
- Use a single set of rules at the level of a workspace
- Make the entire workspace either readonly or readwrite for individual co-workers
- Optional: Make the entire workspace readonly for your Rechenvorhaben (group bwYYMNNN), e.g. for large input data
- If a more granular set of rules is necessary, consider using additional workspaces
- The owner of a workspace is responsible for its content and management
Examples:
Command | Action |
---|---|
getfacl $(ws_find my_workspace) | List access rights on the workspace named "my_workspace" |
setfacl -Rm u:fr_xy1001:rX,d:u:fr_xy1001:rX $(ws_find my_workspace) | Grant user "fr_xy1001" read-only access to the workspace named "my_workspace" |
setfacl -Rm u:fr_xy1001:rwX,d:u:fr_xy1001:rwX $(ws_find my_workspace) | Grant user "fr_xy1001" read and write access to the workspace named "my_workspace" |
setfacl -Rm g:bw16e001:rX,d:g:bw16e001:rX $(ws_find my_workspace) | Grant group (Rechenvorhaben) "bw16e001" read-only access to the workspace named "my_workspace" |
setfacl -Rb $(ws_find my_workspace) | Remove all ACL rights. Standard Unix access rights apply again. |
Local Disk Space
All compute nodes are equipped with a local SSD with 240 GB capacity (usable 200 GB). During computation the environment variable $TMPDIR points to this local disk space. The data will become unavailable as soon as the job has finished.
High Performance Network
The compute nodes all are interconnected through the high performance network Omni-Path which offers a very small latency and 100 Gbit/s throughput. The parallel storage for the workspaces is attached via Omni-Path to all cluster nodes. For non-blocking communication 17 islands with 44 nodes and 880 cores each are available. The islands are connected with a blocking factor of 1:11 (or 400 Gbit/s for 44 nodes).