BinAC2/Hardware and Architecture: Difference between revisions
< BinAC2
Jump to navigation
Jump to search
F Bartusch (talk | contribs) No edit summary |
F Bartusch (talk | contribs) No edit summary |
||
(3 intermediate revisions by the same user not shown) | |||
Line 8: | Line 8: | ||
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help) |
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help) |
||
* (Scientific) Libraries and Software: [[Environment Modules]] |
* (Scientific) Libraries and Software: [[Environment Modules]] |
||
=== Compute Nodes === |
=== Compute Nodes === |
||
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes. |
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes. |
||
* |
* 180 compute nodes |
||
* 14 SMP node |
* 14 SMP node |
||
* 32 GPU nodes (A30) |
* 32 GPU nodes (A30) |
||
* 8 GPU nodes (A100) |
* 8 GPU nodes (A100) |
||
* |
* plus several special purpose nodes for login, interactive jobs, etc. |
||
Compute node specification: |
Compute node specification: |
||
Line 29: | Line 28: | ||
|- |
|- |
||
!scope="column"| Quantity |
!scope="column"| Quantity |
||
| |
| 180 |
||
| 14 |
| 14 |
||
| 32 |
| 32 |
||
Line 76: | Line 75: | ||
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)] |
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)] |
||
|} |
|} |
||
=== Special Purpose Nodes === |
|||
Besides the classical compute node several nodes serve as login and preprocessing nodes, nodes for interactive jobs and nodes for creating virtual environments providing a virtual service environment. |
|||
== Storage Architecture == |
|||
The bwForCluster [https://www.binac.uni-tuebingen.de BinAC] consists of two separate storage systems, one for the user's home directory <tt>$HOME</tt> and one serving as a work space. The home directory is limited in space and parallel access but offers snapshots of your files and Backup. The work space is a parallel file system which offers fast and parallel file access and a bigger capacity than the home directory. This storage is based on [https://www.beegfs.com/ BeeGFS] and can be accessed parallel from many nodes. Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the <tt>$TMPDIR</tt> environment variable. |
|||
{| class="wikitable" |
|||
|- |
|||
! style="width:10%"| |
|||
! style="width:10%"| <tt>$HOME</tt> |
|||
! style="width:10%"| Work Space |
|||
! style="width:10%"| <tt>$TMPDIR</tt> |
|||
|- |
|||
!scope="column" | Visibility |
|||
| global |
|||
| global |
|||
| node local |
|||
|- |
|||
!scope="column" | Lifetime |
|||
| permanent |
|||
| work space lifetime (max. 30 days, max. 3 extensions) |
|||
| batch job walltime |
|||
|- |
|||
!scope="column" | Capacity |
|||
| unkn. |
|||
| 482 TB |
|||
| 211 GB per node |
|||
|- |
|||
!scope="column" | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas] |
|||
| 40 GB per user |
|||
| none |
|||
| none |
|||
|- |
|||
!scope="column" | Backup |
|||
| yes |
|||
| no |
|||
| no |
|||
|} |
|||
global : all nodes access the same file system |
|||
local : each node has its own file system |
|||
permanent : files are stored permanently |
|||
batch job walltime : files are removed at end of the batch job |
|||
=== $HOME === |
|||
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. |
|||
<!-- |
|||
Current disk usage on home directory and quota status can be checked with the '''diskusage''' command: |
|||
$ diskusage |
|||
User Used (GB) Quota (GB) Used (%) |
|||
------------------------------------------------------------------------ |
|||
<username> 4.38 100.00 4.38 |
|||
--> |
|||
NOTE: |
|||
Compute jobs on nodes must not write temporary data to $HOME. |
|||
Instead they should use the local $TMPDIR directory for I/O-heavy use cases |
|||
and work spaces for less I/O intense multinode-jobs. |
|||
<!-- |
|||
'''Quota is full - what to do''' |
|||
In case of 100% usage of the quota user can get some problems with disk writing operations (e.g. error messages during the file copy/edit/save operations). To avoid it - please remove some data that you don't need from the $HOME directory or move it to some temporary place. |
|||
As temporary place for the data user can use: |
|||
* '''Workspace''' - space on the BeeGFS file system, lifetime up to 90 days (see below) |
|||
* '''Scratch on login nodes''' - special directory on every login node (login01..login03): |
|||
** Access via variable $TMPDIR (e.g. "cd $TMPDIR") |
|||
** Lifetime of data - minimum 7 days (based on the last access time) |
|||
** Data is private for every user |
|||
** Each login node has own scratch directory (data is NOT shared) |
|||
** There is NO backup of the data |
|||
To get optimal and comfortable work with the $HOME directory is important to keep the data in order (remove unnecessary and temporary data, archive big files, save large files only on the workspace). |
|||
--> |
|||
=== Work Space === |
|||
Work spaces can be generated through the <tt>workspace</tt> tools. This will generate a directory on the parallel storage. |
|||
To create a work space you'll need to supply a name for your work space area and a lifetime in days. |
|||
For more information read the corresponding help, e.g: <tt>ws_allocate -h</tt>. |
|||
Examples: |
|||
{| class="wikitable" |
|||
|- |
|||
!style="width:30%" | Command |
|||
!style="width:70%" | Action |
|||
|- |
|||
|<tt>ws_allocate mywork 30</tt> |
|||
|Allocate a work space named "mywork" for 30 days. |
|||
|- |
|||
|<tt>ws_allocate myotherwork</tt> |
|||
|Allocate a work space named "myotherwork" with maximum lifetime. |
|||
|- |
|||
|<tt>ws_list -a</tt> |
|||
|List all your work spaces. |
|||
|- |
|||
|<tt>ws_find mywork</tt> |
|||
|Get absolute path of work space "mywork". |
|||
|- |
|||
|<tt>ws_extend mywork 30</tt> |
|||
|Extend life me of work space mywork by 30 days from now. (Not needed, workspaces on BinAC are not limited). |
|||
|- |
|||
|<tt>ws_release mywork</tt> |
|||
|Manually erase your work space "mywork". Please remove directory content first. |
|||
|- |
|||
|} |
|||
=== Local Disk Space === |
|||
All compute nodes are equipped with a local SSD with 200 GB capacity for job execution. During computation the environment variable <tt>$TMPDIR</tt> points to this local disk space. The data will become unavailable as soon as the job has finished. |
|||
=== SDS@hd === |
|||
SDS@hd is mounted only on login03 at <tt>/sds_hd</tt>. |
|||
To access your Speichervorhaben, please see the [[SDS@hd/Access/NFS#access_your_data|SDS@hd documentation]]. |
|||
If you can't see your Speichervorhaben, you can [[BinAC/Support|open a ticket]]. |
Latest revision as of 12:21, 29 August 2024
System Architecture
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and Geosciences.
Operating System and Software
- Operating System: ...
- Queuing System: Slurm (see BinAC2/Slurm for help)
- (Scientific) Libraries and Software: Environment Modules
Compute Nodes
BinAC 2 offers compute nodes, high-mem nodes, and two types of GPU nodes.
- 180 compute nodes
- 14 SMP node
- 32 GPU nodes (A30)
- 8 GPU nodes (A100)
- plus several special purpose nodes for login, interactive jobs, etc.
Compute node specification:
Standard | High-Mem | GPU (A30) | GPU (A100) | |
---|---|---|---|---|
Quantity | 180 | 14 | 32 | 8 |
Processors | 2 x AMD EPYC Milan 7543 | 2 x AMD EPYC Milan 7443 | 2 x AMD EPYC Milan 7543 | 2 x AMD EPYC Milan 7543 |
Processor Frequency (GHz) | 2.80 | 2.85 | 2.80 | 2.80 |
Number of Cores | 64 | 48 | 64 | 64 |
Working Memory (GB) | 512 | 2048 | 512 | 512 |
Local Disk (GB) | 512 (SSD) | 1920 (SSD) | 512 (SSD) | 512 (SSD) |
Interconnect | HDR IB (80 nodes) / 100GbE | HDR | HDR | HDR |
Coprocessors | - | - | 2 x NVIDIA A30 (24 GB ECC HBM2, NVLink | 4 x NVIDIA A100 (80 GB ECC HBM2e) |