NEMO2/Hardware: Difference between revisions
mNo edit summary |
|||
(7 intermediate revisions by the same user not shown) | |||
Line 34: | Line 34: | ||
{| class="wikitable" style="text-align:center;" |
{| class="wikitable" style="text-align:center;" |
||
|- |
|- |
||
! style="width: |
! style="width:15%"| |
||
! style="width: |
! style="width:12%"| Genoa Partition |
||
! style="width: |
! style="width:12%"| Milan Partition |
||
! style="width: |
! style="width:12%"| GPU Partition |
||
! style="width: |
! style="width:12%"| APU Partition |
||
! style="width: |
! style="width:12%"| Login Nodes |
||
|- |
|- |
||
!scope="column"| Quantity |
!scope="column"| Quantity |
||
Line 52: | Line 52: | ||
| 2x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-7763.html AMD EPYC 7763 (Milan)] |
| 2x [https://www.amd.com/en/products/processors/server/epyc/7003-series/amd-epyc-7763.html AMD EPYC 7763 (Milan)] |
||
| 2x [https://www.intel.com/content/www/us/en/products/sku/237558/intel-xeon-platinum-8562y-processor-60m-cache-2-80-ghz/specifications.html Intel Xeon Platinum 8562Y+ (5th Gen)] |
| 2x [https://www.intel.com/content/www/us/en/products/sku/237558/intel-xeon-platinum-8562y-processor-60m-cache-2-80-ghz/specifications.html Intel Xeon Platinum 8562Y+ (5th Gen)] |
||
4x [https://www.nvidia.com/en-us/data-center/l40s/ NVIDIA L40S |
4x [https://www.nvidia.com/en-us/data-center/l40s/ NVIDIA L40S] |
||
| 4x [https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html AMD |
| 4x [https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html AMD Instinct MI300A] |
||
| 1x [https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9354.html AMD |
| 1x [https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9354.html AMD EPYC 9354 (Genoa)] |
||
|- |
|- |
||
!scope="column" | Base Frequency/Boost Frequency (GHz) / APU/GPU Performance (TFLOPs/TOPs) |
!scope="column" | Base Frequency/Boost Frequency (GHz) / APU/GPU Performance (TFLOPs/TOPs) |
||
Line 67: | Line 67: | ||
!scope="column" | CPU Cores per Node |
!scope="column" | CPU Cores per Node |
||
| 192 |
| 192 |
||
'''Usable: 190''' |
'''Usable in Slurm: 190''' |
||
| 128 |
| 128 |
||
'''Usable: 126''' |
'''Usable in Slurm: 126''' |
||
| 64 |
| 64 |
||
'''Usable: 62''' |
'''Usable in Slurm: 62''' |
||
| 4x 24 |
| 4x 24 |
||
'''Usable: 92''' |
'''Usable in Slurm: 92''' |
||
| 32 |
| 32 |
||
|- |
|- |
||
Line 85: | Line 85: | ||
|- |
|- |
||
!scope="column" | Local NVMe (GB) |
!scope="column" | Local NVMe (GB) |
||
| |
| 3840 |
||
| 1920 |
| 1920 |
||
| |
| 3840 |
||
| |
| 3840 |
||
| 480 |
| 480 |
||
|- |
|- |
||
Line 99: | Line 99: | ||
| 100 GbE (RoCEv2) |
| 100 GbE (RoCEv2) |
||
|- |
|- |
||
!scope="column" | |
!scope="column" | Job Example Genoa Partition |
||
|colspan="5" | Maximum resources for a single node job (*): |
|||
| <code>--partition=genoa --ntasks=190 --mem 727GB</code> |
|||
<code>--partition=genoa --ntasks=190 --mem 727GB # or: --mem=745000MB, or: --mem-per-cpu=3900MB</code> |
|||
|- |
|||
⚫ | |||
!scope="column" | Job Example Milan Partition |
|||
⚫ | |||
|colspan="5" | Maximum resources for a single node job (*): |
|||
⚫ | |||
⚫ | |||
| --- |
|||
|- |
|||
!scope="column" | Job Example GPU Partition |
|||
|colspan="5" | Maximum resources for a single node job (*): |
|||
⚫ | |||
|- |
|||
!scope="column" | Job Example APU Partition |
|||
|colspan="5" | Maximum resources for a single node job (*): |
|||
⚫ | |||
|- |
|- |
||
|} |
|} |
||
(*) |
(*) Slurm uses Mebi- and Gibibyte, please multiply/divide with 1024 |
||
== File Systems == |
== File Systems == |
||
Line 144: | Line 151: | ||
| --- |
| --- |
||
|- |
|- |
||
!scope="column" | |
!scope="column" | Snapshots |
||
| |
| daily (7 snapshots) (not yet implemented) |
||
| |
| --- |
||
| |
| --- |
||
|- |
|||
!scope="column" | Backups |
|||
|colspan="3" style="text-align:center;"| '''There is NO storage backup!''' |
|||
|- |
|||
|} |
|} |
||
Line 154: | Line 165: | ||
local : each node has its own file system |
local : each node has its own file system |
||
permanent : files are stored permanently |
permanent : files are stored permanently |
||
however, if an account has lost access, |
however, if an account has lost access, |
||
the remaining data will be deleted after 6 months |
|||
batch job walltime : files are removed at end of the batch job |
batch job walltime : files are removed at end of the batch job |
||
</pre> |
</pre> |
Latest revision as of 14:02, 31 March 2025
Operating System and Software
Operating System | Rocky Linux 9 (similar to RHEL 9) |
---|---|
Queuing System | SLURM (see NEMO2/Slurm for help) |
(Scientific) Libraries and Software | Environment Modules |
Own Software Modules using EasyBuild and Spack | EasyBuild |
Own (Python) Environments with Conda | Conda |
Containers with Apptianer/Singularity (and enroot in the future) | Development/Containers |
Compute and Special Purpose Nodes
For researchers from the scientific fields Neuroscience, Elementary Particle Physics, Microsystems Engineering and Materials Science the bwForCluster NEMO offers about 240 compute nodes plus several special purpose nodes for login, interactive jobs, visualization, machine learning and ai.
Node specification:
Genoa Partition | Milan Partition | GPU Partition | APU Partition | Login Nodes | |
---|---|---|---|---|---|
Quantity | 106 | 137 | 9 | 4 | 2 |
Processors / APU/GPU | 2x AMD EPYC 9654 (Genoa) | 2x AMD EPYC 7763 (Milan) | 2x Intel Xeon Platinum 8562Y+ (5th Gen)
4x NVIDIA L40S |
4x AMD Instinct MI300A | 1x AMD EPYC 9354 (Genoa) |
Base Frequency/Boost Frequency (GHz) / APU/GPU Performance (TFLOPs/TOPs) | 2.4/3.55 | 2.45/3.5 | 2.8/3.8
91.6 (FP32) / 733 (INT8) |
-/3.7
61.3 (FP64) / 122.6 (FP32) / 1960 (INT8) |
3.25/3.75 |
CPU Cores per Node | 192
Usable in Slurm: 190 |
128
Usable in Slurm: 126 |
64
Usable in Slurm: 62 |
4x 24
Usable in Slurm: 92 |
32 |
Memory | 768 GiB, 4.8 GHz (DDR5) | 512 GiB, 3.2 GHZ (DDR4) | 512 GiB, 5.6 GHz (DDR5)
4x 48 GB, 864 GB/s (GDDR6) |
4x 128 GB, 5300 GB/s (HBM3) | 384 GiB, 4.8 GHz (DDR5) |
Local NVMe (GB) | 3840 | 1920 | 3840 | 3840 | 480 |
Interconnect | 100 GbE (RoCEv2) | Omni-Path 100
100 GbE (RoCEv2) |
100 GbE (RoCEv2) | 100 GbE (RoCEv2) | 100 GbE (RoCEv2) |
Job Example Genoa Partition | Maximum resources for a single node job (*):
| ||||
Job Example Milan Partition | Maximum resources for a single node job (*):
(not yet available) | ||||
Job Example GPU Partition | Maximum resources for a single node job (*):
(not yet available) | ||||
Job Example APU Partition | Maximum resources for a single node job (*):
(not yet available) |
(*) Slurm uses Mebi- and Gibibyte, please multiply/divide with 1024
File Systems
NEMO2 offers a fast Weka parallel filesystem, which is only limited by the uplink to this storage (>90GB/s). The storage is used for $HOME and workspaces. There sill be no backups, but we plan to implement Snapshots for the last 7 days in the next months. Additionally, each compute node provides temporary storage on the node-local NVMe disk.
$HOME | Workspaces | NVMe | |
---|---|---|---|
Visibility | global (100 GbE) | node local | |
Lifetime | permanent | workspace lifetime
(max. 100 days, extensions possible) |
batch job walltime |
Capacity | 1 PB | 1.9 TB or more (depends on node) | |
Quotas per $HOME/Workspace | 100 GB | 5 TB | --- |
Snapshots | daily (7 snapshots) (not yet implemented) | --- | --- |
Backups | There is NO storage backup! |
global : all nodes access the same file system local : each node has its own file system permanent : files are stored permanently however, if an account has lost access, the remaining data will be deleted after 6 months batch job walltime : files are removed at end of the batch job