BinAC2/Hardware and Architecture: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
No edit summary
 
(3 intermediate revisions by the same user not shown)
Line 1: Line 1:
== System Architecture ==
= Hardware and Architecture =


The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.


=== Operating System and Software ===
== Operating System and Software ==


* Operating System: ...
* Operating System: Rocky Linux 9.4
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help)
* (Scientific) Libraries and Software: [[Environment Modules]]
* (Scientific) Libraries and Software: [[Environment Modules]]


=== Compute Nodes ===
== Compute Nodes ==


BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.
* 148 compute nodes
* 180 compute nodes
* 14 SMP node
* 14 SMP node
* 32 GPU nodes (A30)
* 32 GPU nodes (A30)
Line 28: Line 28:
|-
|-
!scope="column"| Quantity
!scope="column"| Quantity
| 148
| 180
| 14
| 14
| 32
| 32
Line 75: Line 75:
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)]
|}
|}

= Network =

The compute nodes and the parallel file system are connected via 100GbE ethernet</br>
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 80 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with <code>--constraint=ib</code>.

= Storage =

The bwForCluster BinAC 2 consists of two separate storage systems, one for the user's home directory $HOME and one serving as a project/work space.
The home directory is limited in space and parallel access but offers snapshots of your files and backup.

The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at <code>/pfs/10</code> on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at <code>/pfs/10/project</code> that is accessible for all members of the compute project.
Each user can create workspaces under <code>/pfs/10/work</code> using the workspace tools. These directories are only accessible for the user who created the workspace.

Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable.

{| class="wikitable"
|-
! style="width:10%"|
! style="width:10%"| <tt>$HOME</tt>
! style="width:10%"| project
! style="width:10%"| work
! style="width:10%"| <tt>$TMPDIR</tt>
|-
!scope="column" | Visibility
| global
| global
| global
| node local
|-
!scope="column" | Lifetime
| permanent
| permanent
| work space lifetime (max. 30 days, max. 5 extensions)
| batch job walltime
|-
!scope="column" | Capacity
| -
| 8.1 PB
| 1000 TB
| 512 GB per node; 1920 GB on high-mem nodes
|-
!scope="column" | Speed
| ...
| ...
| ...
| ...
|-
!scope="column" | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas]
| 40 GB per user
| not yet, maybe in the future
| none
| none
|-
!scope="column" | Backup
| yes
| no
| no
| no
|}

global : all nodes access the same file system
local : each node has its own file system
permanent : files are stored permanently
batch job walltime : files are removed at end of the batch job

=== Home ===

Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis.
Because the backup space is limited we enforce a quota of 40GB on the home directories.

'''NOTE:'''
Compute jobs on nodes must not write temporary data to $HOME.
Instead they should use the local $TMPDIR directory for I/O-heavy use cases
and work spaces for less I/O intense multinode-jobs.

<!--
Current disk usage on home directory and quota status can be checked with the '''diskusage''' command:
$ diskusage
User Used (GB) Quota (GB) Used (%)
------------------------------------------------------------------------
<username> 4.38 100.00 4.38
-->
=== Project ===

Each compute project has its own project directory at <code>/pfs/10/project</code>.

<pre>
$ ls -lh /pfs/10/project/
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003
[...]
</pre>

As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.

The data is stored on HDDs. The primary focus of <code>/pfs/10/project</code> is pure capacity, not speed.

=== Work ===

The data at <code>/pfs/10/work</code> is stored on SSDs. The primary focus is speed, not capacity.
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.
We ask you to only store data you actively use for computations on <code>/pfs/10/work</code>.
Please move data to <code>/pfs/10/project</code> when you don't need it on the fast storage any more.

Each user can create workspaces at <code>/pfs/10/work</code> through the workspace tools
To create a work space you'll need to supply a name for your work space area and a lifetime in days.
For more information read the corresponding help, e.g: <code>ws_allocate -h.</code>
{| class="wikitable"
|-
!style="width:30%" | Command
!style="width:70%" | Action
|-
|<code>ws_allocate mywork 30</code>
|Allocate a work space named "mywork" for 30 days.
|-
|<code>ws_allocate myotherwork</code>
|Allocate a work space named "myotherwork" with maximum lifetime.
|-
|<code>ws_list -a</code>
|List all your work spaces.
|-
|<code>ws_find mywork</code>
|Get absolute path of work space "mywork".
|-
|<code>ws_extend mywork 30</code>
|Extend life me of work space mywork by 30 days from now.
|-
|<code>ws_release mywork</code>
|Manually erase your work space "mywork". Please remove directory content first.
|-
|}

=== Scratch ===

Please use the fast local scratch space for storing temporary data during your jobs.

For each job a scratch directory will be created on the compute nodes. It is available via the environment variable <code>$TMPDIR</code>, which points to <code>/scratch/<jobID></code>.

Latest revision as of 17:54, 13 December 2024

Hardware and Architecture

The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.

Operating System and Software

Compute Nodes

BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.

  • 180 compute nodes
  • 14 SMP node
  • 32 GPU nodes (A30)
  • 8 GPU nodes (A100)
  • plus several special purpose nodes for login, interactive jobs, etc.

Compute node specification:

Standard High-Mem GPU (A30) GPU (A100)
Quantity 180 14 32 8
Processors 2 x AMD EPYC Milan 7543 2 x AMD EPYC Milan 7443 2 x AMD EPYC Milan 7543 2 x AMD EPYC Milan 7543
Processor Frequency (GHz) 2.80 2.85 2.80 2.80
Number of Cores 64 48 64 64
Working Memory (GB) 512 2048 512 512
Local Disk (GB) 512 (SSD) 1920 (SSD) 512 (SSD) 512 (SSD)
Interconnect HDR IB (80 nodes) / 100GbE HDR HDR HDR
Coprocessors - - 2 x NVIDIA A30 (24 GB ECC HBM2, NVLink 4 x NVIDIA A100 (80 GB ECC HBM2e)

Network

The compute nodes and the parallel file system are connected via 100GbE ethernet
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 80 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with --constraint=ib.

Storage

The bwForCluster BinAC 2 consists of two separate storage systems, one for the user's home directory $HOME and one serving as a project/work space. The home directory is limited in space and parallel access but offers snapshots of your files and backup.

The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at /pfs/10 on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at /pfs/10/project that is accessible for all members of the compute project. Each user can create workspaces under /pfs/10/work using the workspace tools. These directories are only accessible for the user who created the workspace.

Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable.

$HOME project work $TMPDIR
Visibility global global global node local
Lifetime permanent permanent work space lifetime (max. 30 days, max. 5 extensions) batch job walltime
Capacity - 8.1 PB 1000 TB 512 GB per node; 1920 GB on high-mem nodes
Speed ... ... ... ...
Quotas 40 GB per user not yet, maybe in the future none none
Backup yes no no no
 global             : all nodes access the same file system
 local              : each node has its own file system
 permanent          : files are stored permanently
 batch job walltime : files are removed at end of the batch job

Home

Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. Because the backup space is limited we enforce a quota of 40GB on the home directories.

NOTE: Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $TMPDIR directory for I/O-heavy use cases and work spaces for less I/O intense multinode-jobs.

Project

Each compute project has its own project directory at /pfs/10/project.

$ ls -lh /pfs/10/project/
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003
[...]

As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.

The data is stored on HDDs. The primary focus of /pfs/10/project is pure capacity, not speed.

Work

The data at /pfs/10/work is stored on SSDs. The primary focus is speed, not capacity. In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited. We ask you to only store data you actively use for computations on /pfs/10/work. Please move data to /pfs/10/project when you don't need it on the fast storage any more.

Each user can create workspaces at /pfs/10/work through the workspace tools To create a work space you'll need to supply a name for your work space area and a lifetime in days. For more information read the corresponding help, e.g: ws_allocate -h.

Command Action
ws_allocate mywork 30 Allocate a work space named "mywork" for 30 days.
ws_allocate myotherwork Allocate a work space named "myotherwork" with maximum lifetime.
ws_list -a List all your work spaces.
ws_find mywork Get absolute path of work space "mywork".
ws_extend mywork 30 Extend life me of work space mywork by 30 days from now.
ws_release mywork Manually erase your work space "mywork". Please remove directory content first.

Scratch

Please use the fast local scratch space for storing temporary data during your jobs.

For each job a scratch directory will be created on the compute nodes. It is available via the environment variable $TMPDIR, which points to /scratch/<jobID>.