BinAC2/Hardware and Architecture: Difference between revisions
F Bartusch (talk | contribs) No edit summary |
No edit summary |
||
(22 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
= Hardware and Architecture = |
|||
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Astrophysics, and |
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy. |
||
== Operating System and Software == |
|||
* Operating System: . |
* Operating System: Rocky Linux 9.5 |
||
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help) |
* Queuing System: [https://slurm.schedmd.com/documentation.html Slurm] (see [[BinAC2/Slurm]] for help) |
||
* (Scientific) Libraries and Software: [[Environment Modules]] |
* (Scientific) Libraries and Software: [[Environment Modules]] |
||
== Compute Nodes == |
|||
BinAC 2 offers compute nodes, high-mem nodes, and |
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes. |
||
* |
* 180 compute nodes |
||
* |
* 16 SMP node |
||
* 32 GPU nodes ( |
* 32 GPU nodes (2xA30) |
||
* 8 GPU nodes ( |
* 8 GPU nodes (4xA100) |
||
* 4 GPU nodes (4xH200) |
|||
* splus several special purpose nodes for login, interactive jobs, etc. |
|||
* plus several special purpose nodes for login, interactive jobs, etc. |
|||
Compute node specification: |
Compute node specification: |
||
Line 26: | Line 27: | ||
! style="width:10%"| GPU (A30) |
! style="width:10%"| GPU (A30) |
||
! style="width:10%"| GPU (A100) |
! style="width:10%"| GPU (A100) |
||
! style="width:10%"| GPU (H200) |
|||
|- |
|- |
||
!scope="column"| Quantity |
!scope="column"| Quantity |
||
| |
| 180 |
||
| 14 |
| 14 / 2 |
||
| 32 |
| 32 |
||
| 8 |
| 8 |
||
| 4 |
|||
|- |
|- |
||
!scope="column" | Processors |
!scope="column" | Processors |
||
| 2 x [https://www.amd.com/de/products/ |
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] |
||
| 2 x [https://www.amd.com/de/products/ |
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7443.html AMD EPYC Milan 7443] / 2 x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-75f3.html AMD EPYC Milan 75F3] |
||
| 2 x [https://www.amd.com/de/products/ |
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] |
||
| 2 x [https://www.amd.com/de/products/ |
| 2 x [https://www.amd.com/de/products/processors/server/epyc/7003-series/amd-epyc-7543.html AMD EPYC Milan 7543] |
||
| 2 x [https://www.amd.com/de/products/processors/server/epyc/9005-series/amd-epyc-9555.html AMD EPYC Milan 9555] |
|||
|- |
|- |
||
!scope="column" | Processor Frequency (GHz) |
!scope="column" | Processor Base Frequency (GHz) |
||
| 2.80 |
| 2.80 |
||
| 2.85 |
| 2.85 / 2.95 |
||
| 2.80 |
| 2.80 |
||
| 2.80 |
| 2.80 |
||
| 3.20 |
|||
|- |
|- |
||
!scope="column" | Number of Cores |
!scope="column" | Number of Physical Cores / Hypertreads |
||
| 64 |
| 64 / 128 |
||
| 48 / 96 // 64 / 128 |
|||
| 48 |
|||
| 64 |
| 64 / 128 |
||
| 64 |
| 64 / 128 |
||
| 128 / 256 |
|||
|- |
|- |
||
!scope="column" | Working Memory (GB) |
!scope="column" | Working Memory (GB) |
||
Line 56: | Line 62: | ||
| 512 |
| 512 |
||
| 512 |
| 512 |
||
| 1536 |
|||
|- |
|- |
||
!scope="column" | Local Disk ( |
!scope="column" | Local Disk (GiB) |
||
| |
| 450 (NVMe-SSD) |
||
| |
| 14000 (NVMe-SSD) |
||
| |
| 450 (NVMe-SSD) |
||
| |
| 14000 (NVMe-SSD) |
||
| 28000 (NVMe-SSD) |
|||
|- |
|- |
||
!scope="column" | Interconnect |
!scope="column" | Interconnect |
||
| HDR IB ( |
| HDR 100 IB (84 nodes) / 100GbE (96 nodes) |
||
| |
| 100GbE |
||
| |
| 100GbE |
||
| |
| 100GbE |
||
| HDR 200 IB + 100GbE |
|||
|- |
|- |
||
!scope="column" | Coprocessors |
!scope="column" | Coprocessors |
||
| - |
| - |
||
| - |
| - |
||
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink] |
| 2 x [https://www.nvidia.com/de-de/data-center/products/a30-gpu/ NVIDIA A30 (24 GB ECC HBM2, NVLink)] |
||
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)] |
| 4 x [https://www.nvidia.com/de-de/data-center/a100/ NVIDIA A100 (80 GB ECC HBM2e)] |
||
| 4 x [https://www.nvidia.com/de-de/data-center/h200/ NVIDIA H200 NVL (141 GB ECC HBM3e, NVLink)] |
|||
|} |
|} |
||
= Network = |
|||
== Storage Architecture == |
|||
The compute nodes and the parallel file system are connected via 100GbE ethernet</br> |
|||
The bwForCluster [https://www.binac.uni-tuebingen.de BinAC] consists of two separate storage systems, one for the user's home directory <tt>$HOME</tt> and one serving as a work space. The home directory is limited in space and parallel access but offers snapshots of your files and Backup. The work space is a parallel file system which offers fast and parallel file access and a bigger capacity than the home directory. This storage is based on [https://www.beegfs.com/ BeeGFS] and can be accessed parallel from many nodes. Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the <tt>$TMPDIR</tt> environment variable. |
|||
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with <code>--constraint=ib</code>. |
|||
= File Systems = |
|||
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user's home directory $HOME and one serving as a project/work space. |
|||
The home directory is limited in space and parallel access but offers snapshots of your files and backup. |
|||
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at <code>/pfs/10</code> on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at <code>/pfs/10/project</code> that is accessible for all members of the compute project. |
|||
Each user can create workspaces under <code>/pfs/10/work</code> using the workspace tools. These directories are only accessible for the user who created the workspace. |
|||
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable. |
|||
{| class="wikitable" |
{| class="wikitable" |
||
Line 84: | Line 105: | ||
! style="width:10%"| |
! style="width:10%"| |
||
! style="width:10%"| <tt>$HOME</tt> |
! style="width:10%"| <tt>$HOME</tt> |
||
! style="width:10%"| |
! style="width:10%"| project |
||
! style="width:10%"| work |
|||
! style="width:10%"| <tt>$TMPDIR</tt> |
! style="width:10%"| <tt>$TMPDIR</tt> |
||
|- |
|- |
||
!scope="column" | Visibility |
!scope="column" | Visibility |
||
| global |
| global |
||
| global |
| global |
||
| global |
|||
| node local |
| node local |
||
|- |
|- |
||
!scope="column" | Lifetime |
!scope="column" | Lifetime |
||
| permanent |
| permanent |
||
| permanent |
|||
| work space lifetime (max. 30 days, max. 3 extensions) |
|||
| work space lifetime (max. 30 days, max. 5 extensions) |
|||
| batch job walltime |
| batch job walltime |
||
|- |
|- |
||
!scope="column" | Capacity |
!scope="column" | Capacity |
||
| |
| - |
||
| |
| 8.1 PB |
||
| 1000 TB |
|||
| 211 GB per node |
|||
| 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes) |
|||
|- |
|||
!scope="column" | Speed (read) |
|||
| ≈ 1 GB/s, shared by all nodes |
|||
| max. 12 GB/s |
|||
| ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping |
|||
| ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node |
|||
|- |
|- |
||
!scope="column" | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas] |
!scope="column" | [https://en.wikipedia.org/wiki/Disk_quota#Quotas Quotas] |
||
| 40 GB per user |
| 40 GB per user |
||
| not yet, maybe in the future |
|||
| none |
| none |
||
| none |
| none |
||
|- |
|- |
||
!scope="column" | Backup |
!scope="column" | Backup |
||
| yes |
| yes (nightly) |
||
| no |
| '''no''' |
||
| no |
| '''no''' |
||
| '''no''' |
|||
|} |
|} |
||
Line 118: | Line 151: | ||
batch job walltime : files are removed at end of the batch job |
batch job walltime : files are removed at end of the batch job |
||
{| class="wikitable" style="color:red; background-color:#ffffcc;" cellpadding="10" |
|||
| |
|||
Please note that due to the large capacity of '''work''' and '''project''' and due to frequent file changes on these file systems, no backup can be provided.</br> |
|||
Backing up these file systems would require a redundant storage facility with multiple times the capacity of '''project'''. Furthermore, regular backups would significantly degrade the performance.</br> |
|||
Data is stored redundantly, i.e. immune against disk failures but not immune against catastrophic incidents like cyber attacks or a fire in the server room.</br> |
|||
Please consider to use on of the remote storage facilities like [https://wiki.bwhpc.de/e/SDS@hd SDS@hd], [https://uni-tuebingen.de/einrichtungen/zentrum-fuer-datenverarbeitung/projekte/laufende-projekte/bwsfs bwSFS], [https://www.scc.kit.edu/en/services/lsdf.php LSFD Online Storage] or the [https://www.rda.kit.edu/english/ bwDataArchive] to back up your valuable data. |
|||
|} |
|||
=== $HOME === |
|||
=== Home === |
|||
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. |
|||
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. |
|||
Because the backup space is limited we enforce a quota of 40GB on the home directories. |
|||
'''NOTE:''' |
|||
Compute jobs on nodes must not write temporary data to $HOME. |
|||
Instead they should use the local $TMPDIR directory for I/O-heavy use cases |
|||
and work spaces for less I/O intense multinode-jobs. |
|||
<!-- |
<!-- |
||
Current disk usage on home directory and quota status can be checked with the '''diskusage''' command: |
Current disk usage on home directory and quota status can be checked with the '''diskusage''' command: |
||
$ diskusage |
$ diskusage |
||
Line 130: | Line 177: | ||
------------------------------------------------------------------------ |
------------------------------------------------------------------------ |
||
<username> 4.38 100.00 4.38 |
<username> 4.38 100.00 4.38 |
||
--> |
--> |
||
=== Project === |
|||
Each compute project has its own project directory at <code>/pfs/10/project</code>. |
|||
<pre> |
|||
NOTE: |
|||
$ ls -lh /pfs/10/project/ |
|||
Compute jobs on nodes must not write temporary data to $HOME. |
|||
drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003 |
|||
Instead they should use the local $TMPDIR directory for I/O-heavy use cases |
|||
[...] |
|||
and work spaces for less I/O intense multinode-jobs. |
|||
</pre> |
|||
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc. |
|||
The data is stored on HDDs. The primary focus of <code>/pfs/10/project</code> is pure capacity, not speed. |
|||
<!-- |
|||
'''Quota is full - what to do''' |
|||
=== Work === |
|||
In case of 100% usage of the quota user can get some problems with disk writing operations (e.g. error messages during the file copy/edit/save operations). To avoid it - please remove some data that you don't need from the $HOME directory or move it to some temporary place. |
|||
The data at <code>/pfs/10/work</code> is stored on SSDs. The primary focus is speed, not capacity. |
|||
As temporary place for the data user can use: |
|||
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited. |
|||
We ask you to only store data you actively use for computations on <code>/pfs/10/work</code>. |
|||
Please move data to <code>/pfs/10/project</code> when you don't need it on the fast storage any more. |
|||
Each user can create workspaces at <code>/pfs/10/work</code> through the workspace tools |
|||
* '''Workspace''' - space on the BeeGFS file system, lifetime up to 90 days (see below) |
|||
To create a work space you'll need to supply a name for your work space area and a lifetime in days. |
|||
For more information read the corresponding help, e.g: <code>ws_allocate -h.</code> |
|||
* '''Scratch on login nodes''' - special directory on every login node (login01..login03): |
|||
** Access via variable $TMPDIR (e.g. "cd $TMPDIR") |
|||
** Lifetime of data - minimum 7 days (based on the last access time) |
|||
** Data is private for every user |
|||
** Each login node has own scratch directory (data is NOT shared) |
|||
** There is NO backup of the data |
|||
To get optimal and comfortable work with the $HOME directory is important to keep the data in order (remove unnecessary and temporary data, archive big files, save large files only on the workspace). |
|||
--> |
|||
=== Work Space === |
|||
Work spaces can be generated through the <tt>workspace</tt> tools. This will generate a directory on the parallel storage. |
|||
To create a work space you'll need to supply a name for your work space area and a lifetime in days. |
|||
For more information read the corresponding help, e.g: <tt>ws_allocate -h</tt>. |
|||
Examples: |
|||
{| class="wikitable" |
{| class="wikitable" |
||
|- |
|- |
||
Line 173: | Line 207: | ||
!style="width:70%" | Action |
!style="width:70%" | Action |
||
|- |
|- |
||
|< |
|<code>ws_allocate mywork 30</code> |
||
|Allocate a work space named "mywork" for 30 days. |
|Allocate a work space named "mywork" for 30 days. |
||
|- |
|- |
||
|< |
|<code>ws_allocate myotherwork</code> |
||
|Allocate a work space named "myotherwork" with maximum lifetime. |
|Allocate a work space named "myotherwork" with maximum lifetime. |
||
|- |
|- |
||
|< |
|<code>ws_list -a</code> |
||
|List all your work spaces. |
|List all your work spaces. |
||
|- |
|- |
||
|< |
|<code>ws_find mywork</code> |
||
|Get absolute path of work space "mywork". |
|Get absolute path of work space "mywork". |
||
|- |
|- |
||
|< |
|<code>ws_extend mywork 30</code> |
||
|Extend life me of work space mywork by 30 days from now |
|Extend life me of work space mywork by 30 days from now. |
||
|- |
|- |
||
|< |
|<code>ws_release mywork</code> |
||
|Manually erase your work space "mywork". Please remove directory content first. |
|Manually erase your work space "mywork". Please remove directory content first. |
||
|- |
|- |
||
|} |
|} |
||
=== |
=== Scratch === |
||
Please use the fast local scratch space for storing temporary data during your jobs. |
|||
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable <code>$TMPDIR</code>, which points to <code>/scratch/<jobID></code>. |
|||
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training. |
|||
The Lustre file system (<code>WORK</code> and <code>PROJECT</code>) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than <code>WORK</code>. |
|||
All compute nodes are equipped with a local SSD with 200 GB capacity for job execution. During computation the environment variable <tt>$TMPDIR</tt> points to this local disk space. The data will become unavailable as soon as the job has finished. |
|||
=== SDS@hd === |
=== SDS@hd === |
||
SDS@hd is mounted |
SDS@hd is mounted via NFS on login and compute nodes at <syntaxhighlight inline>/mnt/sds-hd</syntaxhighlight>. |
||
To access your Speichervorhaben, please see the [[SDS@hd/Access/NFS#access_your_data|SDS@hd documentation]]. |
|||
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact [mailto:sds-hd-support@urz.uni-heidelberg.de SDS@hd support] and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2. |
|||
If you can't see your Speichervorhaben, you can [[BinAC/Support|open a ticket]]. |
|||
Once this has been done, you can access your Speichervorhaben as described in the [https://wiki.bwhpc.de/e/SDS@hd/Access/NFS#Access_your_data SDS documentation]. |
|||
<syntaxhighlight> |
|||
$ kinit $USER |
|||
Password for <user>@BWSERVICES.UNI-HEIDELBERG.DE: |
|||
</syntaxhighlight> |
|||
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes. |
Latest revision as of 12:23, 1 July 2025
Hardware and Architecture
The bwForCluster BinAC 2 supports researchers from the broader fields of Bioinformatics, Medical Informatics, Astrophysics, Geosciences and Pharmacy.
Operating System and Software
- Operating System: Rocky Linux 9.5
- Queuing System: Slurm (see BinAC2/Slurm for help)
- (Scientific) Libraries and Software: Environment Modules
Compute Nodes
BinAC 2 offers compute nodes, high-mem nodes, and three types of GPU nodes.
- 180 compute nodes
- 16 SMP node
- 32 GPU nodes (2xA30)
- 8 GPU nodes (4xA100)
- 4 GPU nodes (4xH200)
- plus several special purpose nodes for login, interactive jobs, etc.
Compute node specification:
Standard | High-Mem | GPU (A30) | GPU (A100) | GPU (H200) | |
---|---|---|---|---|---|
Quantity | 180 | 14 / 2 | 32 | 8 | 4 |
Processors | 2 x AMD EPYC Milan 7543 | 2 x AMD EPYC Milan 7443 / 2 x AMD EPYC Milan 75F3 | 2 x AMD EPYC Milan 7543 | 2 x AMD EPYC Milan 7543 | 2 x AMD EPYC Milan 9555 |
Processor Base Frequency (GHz) | 2.80 | 2.85 / 2.95 | 2.80 | 2.80 | 3.20 |
Number of Physical Cores / Hypertreads | 64 / 128 | 48 / 96 // 64 / 128 | 64 / 128 | 64 / 128 | 128 / 256 |
Working Memory (GB) | 512 | 2048 | 512 | 512 | 1536 |
Local Disk (GiB) | 450 (NVMe-SSD) | 14000 (NVMe-SSD) | 450 (NVMe-SSD) | 14000 (NVMe-SSD) | 28000 (NVMe-SSD) |
Interconnect | HDR 100 IB (84 nodes) / 100GbE (96 nodes) | 100GbE | 100GbE | 100GbE | HDR 200 IB + 100GbE |
Coprocessors | - | - | 2 x NVIDIA A30 (24 GB ECC HBM2, NVLink) | 4 x NVIDIA A100 (80 GB ECC HBM2e) | 4 x NVIDIA H200 NVL (141 GB ECC HBM3e, NVLink) |
Network
The compute nodes and the parallel file system are connected via 100GbE ethernet
In contrast to BinAC 1 not all compute nodes are connected via Infiniband, but there are 84 standard compute nodes connected via HDR Infiniband (100 GbE). In order to get your jobs onto the Infiniband nodes, submit your job with --constraint=ib
.
File Systems
The bwForCluster BinAC 2 consists of two separate storage systems, one for the user's home directory $HOME and one serving as a project/work space. The home directory is limited in space and parallel access but offers snapshots of your files and backup.
The project/work is a parallel file system (PFS) which offers fast and parallel file access and a bigger capacity than the home directory. It is mounted at /pfs/10
on the login and compute nodes. This storage is based on Lustre and can be accessed parallel from many nodes. The PFS contains the project and the work directory. Each compute project has its own directory at /pfs/10/project
that is accessible for all members of the compute project.
Each user can create workspaces under /pfs/10/work
using the workspace tools. These directories are only accessible for the user who created the workspace.
Additionally, each compute node provides high-speed temporary storage (SSD) on the node-local solid state disk via the $TMPDIR environment variable.
$HOME | project | work | $TMPDIR | |
---|---|---|---|---|
Visibility | global | global | global | node local |
Lifetime | permanent | permanent | work space lifetime (max. 30 days, max. 5 extensions) | batch job walltime |
Capacity | - | 8.1 PB | 1000 TB | 480 GB (compute nodes); 7.7 TB (GPU-A30 nodes); 16 TB (GPU-A100 and SMP nodes); 31 TB (GPU-H200 nodes) |
Speed (read) | ≈ 1 GB/s, shared by all nodes | max. 12 GB/s | ≈ 145 GB/s peak, aggregated over 56 nodes, ideal striping | ≈ 3 GB/s (compute)/ ≈5 GB/S (GPUA-30)/ ≈ 26 GB/s (GPU-A100 + SMP)/ ≈ 42 GB/s (GPU-H200) per node |
Quotas | 40 GB per user | not yet, maybe in the future | none | none |
Backup | yes (nightly) | no | no | no |
global : all nodes access the same file system local : each node has its own file system permanent : files are stored permanently batch job walltime : files are removed at end of the batch job
Please note that due to the large capacity of work and project and due to frequent file changes on these file systems, no backup can be provided. |
Home
Home directories are meant for permanent file storage of files that are keep being used like source codes, configuration files, executable programs etc.; the content of home directories will be backed up on a regular basis. Because the backup space is limited we enforce a quota of 40GB on the home directories.
NOTE: Compute jobs on nodes must not write temporary data to $HOME. Instead they should use the local $TMPDIR directory for I/O-heavy use cases and work spaces for less I/O intense multinode-jobs.
Project
Each compute project has its own project directory at /pfs/10/project
.
$ ls -lh /pfs/10/project/ drwxrwx---. 2 root bw16f003 33K Dec 12 16:46 bw16f003 [...]
As you can see the directory is owned by a group representing your compute project (here bw16f003) and the directory is accessible by all group members. It is upon your group to decide how to use the space inside this directory: shared data folders, personal directories for each project member, software containers, etc.
The data is stored on HDDs. The primary focus of /pfs/10/project
is pure capacity, not speed.
Work
The data at /pfs/10/work
is stored on SSDs. The primary focus is speed, not capacity.
In contrast to BinAC 1 we will enforce work space lifetime, as the capacity is limited.
We ask you to only store data you actively use for computations on /pfs/10/work
.
Please move data to /pfs/10/project
when you don't need it on the fast storage any more.
Each user can create workspaces at /pfs/10/work
through the workspace tools
To create a work space you'll need to supply a name for your work space area and a lifetime in days.
For more information read the corresponding help, e.g: ws_allocate -h.
Command | Action |
---|---|
ws_allocate mywork 30
|
Allocate a work space named "mywork" for 30 days. |
ws_allocate myotherwork
|
Allocate a work space named "myotherwork" with maximum lifetime. |
ws_list -a
|
List all your work spaces. |
ws_find mywork
|
Get absolute path of work space "mywork". |
ws_extend mywork 30
|
Extend life me of work space mywork by 30 days from now. |
ws_release mywork
|
Manually erase your work space "mywork". Please remove directory content first. |
Scratch
Please use the fast local scratch space for storing temporary data during your jobs.
For each job a scratch directory will be created on the compute nodes. It is available via the environment variable $TMPDIR
, which points to /scratch/<jobID>
.
Especially the SMP nodes and the GPU nodes are equipped with large and fast local disks that should be used for temporary data, scratch data or data staging for ML model training.
The Lustre file system (WORK
and PROJECT
) is unsuited for repetitive random I/O, I/O sizes smaller than the Lustre and ZFS block size (1M) or I/O patterns where files are opened and closed in rapid succession. The XFS file system of the local scratch drives is better suited for typical scratch workloads and access patterns. Moreover, the local scratch drives offer a lower latency and a higher bandwidth than WORK
.
SDS@hd
SDS@hd is mounted via NFS on login and compute nodes at /mnt/sds-hd
.
To access your Speichervorhaben, the export to BinAC 2 must first be enabled by the SDS@hd-Team. Please contact SDS@hd support and provide the acronym of your Speichervorhaben, along with a request to enable the export to BinAC 2.
Once this has been done, you can access your Speichervorhaben as described in the SDS documentation.
$ kinit $USER
Password for <user>@BWSERVICES.UNI-HEIDELBERG.DE:
The Kerberos ticket store is shared across all nodes. Creating a single ticket is sufficient to access your Speichervorhaben on all nodes.