NEMO2/Hardware

From bwHPC Wiki
< NEMO2
Revision as of 20:06, 18 March 2025 by M Janczyk (talk | contribs) (Created page with "== Operating System and Software == {| class="wikitable" |- !scope="column" | Operating System | [https://rockylinux.org Rocky Linux 9] (similar to RHEL 9) |- !scope="column" | Queuing System | [https://slurm.schedmd.com SLURM] (see NEMO2/Slurm for help) |- !scope="column" | (Scientific) Libraries and Software | Environment Modules |- !scope="column" | Own Software Modules using EasyBuild and Spack | [https://docs.easybuild.io/ EasyBuild] |- !scope="column" | O...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Operating System and Software

Operating System Rocky Linux 9 (similar to RHEL 9)
Queuing System SLURM (see NEMO2/Slurm for help)
(Scientific) Libraries and Software Environment Modules
Own Software Modules using EasyBuild and Spack EasyBuild
Own (Python) Environments with Conda Conda
Containers with Apptianer/Singularity (and enroot in the future) Development/Containers

Compute and Special Purpose Nodes

For researchers from the scientific fields Neuroscience, Elementary Particle Physics, Microsystems Engineering and Materials Science the bwForCluster NEMO offers about 240 compute nodes plus several special purpose nodes for login, interactive jobs, visualization, machine learning and ai.

Node specification:

Genoa Partition Milan Partition GPU Partition APU Partition Login Nodes
Quantity 106 137 9 4 2
Processors / APU/GPU 2x AMD EPYC 9654 (Genoa) 2x AMD EPYC 7763 (Milan) 2x Intel Xeon Platinum 8562Y+ (5th Gen)

4x NVIDIA L40S)

4x AMD EPYC 7763 (Milan) 1x AMD Instinct MI300A
Base Frequency/Boost Frequency (GHz) / APU/GPU Performance (TFLOPs/TOPs) 2.4/3.55 2.45/3.5 2.8/3.8

91.6 (FP32) / 733 (INT8)

-/3.7

61.3 (FP64) / 122.6 (FP32) / 1960 (INT8)

3.25/3.75
CPU Cores per Node 192

Usable: 190

128

Usable: 126

64

Usable: 62

4x 24

Usable: 92

32
Memory 768 GiB, 4.8 GHz (DDR5) 512 GiB, 3.2 GHZ (DDR4) 512 GiB, 5.6 GHz (DDR5)

4x 48 GB, 864 GB/s (GDDR6)

4x 128 GB, 5300 GB/s (HBM3) 384 GiB, 4.8 GHz (DDR5)
Local NVMe (GB) 3820 1920 3820 3820 480
Interconnect 100 GbE (RoCEv2) Omni-Path 100

100 GbE (RoCEv2)

100 GbE (RoCEv2) 100 GbE (RoCEv2) 100 GbE (RoCEv2)
Max. Single Node Job (*) --partition=genoa --ntasks=190 --mem 727GB

Alternatives: --mem=745000MB / --mem-per-cpu=3900MB

(not yet available) (not yet available) (not yet available) ---

(*) SLURM uses Mebi- and Gibibyte, please multiply/divide with 1024

Storage Architecture

NEMO2 offers a fast Weka parallel filesystem, which is only limited by the uplink to this storage (>90GB/s). The storage is used for $HOME and workspaces. There sill be no backups, but we plan to implement Snapshots for the last 7 days in the next months. Additionally, each compute node provides temporary storage on the node-local NVMe disk.

$HOME Workspaces NVMe
Visibility colspan="2" style="text-align:center;" global (100 GbE]) node local
Lifetime permanent workspace lifetime (max. 100 days, extensions possible) batch job walltime
Capacity colspan="2" style="text-align:center;" 1 PB 1.9 TB or more (depends on node)
Quotas per $HOME/Workspace 100 GB 5 TB ---
Backup / Snapshots --- / daily (7 snapshots) (not yet implemented) --- / --- --- / ---
  global             : all nodes access the same file system
  local              : each node has its own file system
  permanent          : files are stored permanently
                       however, if an account has lost access, the remaining data will be deleted after 6 months
  batch job walltime : files are removed at end of the batch job