NEMO2/Hardware: Difference between revisions

Latest revision as of 17:33, 15 September 2025

Operating System and Software

Operating System	Rocky Linux 9 (similar to RHEL 9)
Queuing System	SLURM (see NEMO2/Slurm for help)
(Scientific) Libraries and Software	Environment Modules
Own Software Modules using EasyBuild and Spack	EasyBuild
Own (Python) Environments with Conda	Conda
Containers with Apptianer/Singularity (and enroot in the future)	Development/Containers

Compute and Special Purpose Nodes

For researchers from the scientific fields Neuroscience, Elementary Particle Physics, Microsystems Engineering and Materials Science the bwForCluster NEMO offers about 240 compute nodes plus several special purpose nodes for login, interactive jobs, visualization, machine learning and ai.

Node specification (see NEMO2/Slurm for slurm partitons):

	Genoa Partition	Milan Partition	L40S Partition	MI300A Partition	H200 Partition (August 2025)	Login Partition
Quantity	106	137	9	4	2	2
Processors / APU/GPU	2x AMD EPYC 9654 (Genoa)	2x AMD EPYC 7763 (Milan)	2x Intel Xeon Platinum 8562Y+ (5th Gen) 4x NVIDIA L40S	4x AMD Instinct MI300A	2x AMD EPYC 9654 (Genoa) 8x NVIDIA H200	1x AMD EPYC 9354 (Genoa)
Base Frequency/Boost Frequency (GHz) / APU/GPU Performance (TFLOPs/TOPs)	2.4/3.55	2.45/3.5	2.8/3.8 91.6 (FP32) / 733 (INT8)	-/3.7 61.3 (FP64) / 122.6 (FP32) / 1960 (INT8)	2.4/3.55 34 (FP64) / 67 (FP32) / 3958 (INT8)	3.25/3.75
CPU Cores per Node	192 Usable in Slurm: 190	128 Usable in Slurm: 126	64 Usable in Slurm: 62	4x 24 Usable in Slurm: 92	192 Usable in Slurm: 190	32 (SMT on) Usable for Users: 4
CPU Cores per APU/GPU	---	---	15	23	23	---
Memory	768 GiB, 4.8 GHz (DDR5) Usable in Slurm: 727 GiB, 745000 MiB, per Core: 3900 MiB	512 GiB, 3.2 GHZ (DDR4) Usable in Slurm: 495 GiB, 507000 MiB, per Core: 4000 MiB	512 GiB, 5.6 GHz (DDR5) 4x 48 GB, 864 GB/s (GDDR6) Usable in Slurm: 495 GiB, 507000 MiB, per Core: 8100 MiB	4x 128 GB, 5300 GB/s (HBM3) Usable in Slurm: 495 GiB, 507000 MiB, per Core: 5300 MiB	1536 GiB, 4.8 GHz (DDR5) 8x 141 GB, 4813 GB/s (HBM3e) Usable in Slurm: 1459 GiB, 1495000 MiB, per Core:7800 MiB	384 GiB, 4.8 GHz (DDR5) Usable for Users: 50 GiB, 51200 MiB, per Core: 12800 MiB
Local NVMe (GB)	3840	1920	3840	3840	3840	480
Interconnect	100 GbE (RoCEv2)	Omni-Path 100 100 GbE (RoCEv2)	100 GbE (RoCEv2)	100 GbE (RoCEv2)	100 GbE (RoCEv2)	100 GbE (RoCEv2)
Job Example Genoa Partition	Maximum resources for a single node job (*): `--partition=genoa --ntasks=190 --mem=727GB # or: --mem=745000MB, or: --mem-per-cpu=3900MB`
Job Example Milan Partition	Maximum resources for a single node job (*): `--partition=milan --ntasks=126 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=4000MB`
Job Example L40S Partition	Maximum resources for a single node job (*): `--partition=l40s --ntasks=62 --gres=gpu:4 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=8100MB`
Job Example MI300A Partition	Maximum resources for a single node job (*): `--partition=mi300a --ntasks=94 --gres=gpu:4 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=5300MB`
Job Example H200 Partition	Maximum resources for a single node job (*): `--partition=h200 --ntasks=190 --gres=gpu:8 --mem=1459GB # or: --mem=1495000MB, or: --mem-per-cpu=7800MB`
Job Example Login Partition	Maximum resources for a single node job (*): `--partition=login --ntasks=4 --time=15:00 --mem=50GB + or: --mem=51200MB, or: --mem-per-cpu=12800MB`

(*) Slurm internally uses Mebi- (MiB) and Gibibyte (GiB), please multiply/divide with 1024 for (M/G).

File Systems

NEMO2 offers a fast Weka parallel filesystem, which is only limited by the uplink to this storage (>90GB/s). The storage is used for $HOME and workspaces. There sill be no backups, but we plan to implement Snapshots for the last 7 days in the next months. Additionally, each compute node provides temporary storage on the node-local NVMe disk.

	`$HOME`	Workspaces	NVMe
Visibility	global (100 GbE)		node local
Lifetime	permanent	workspace lifetime (max. 100 days, extensions possible)	batch job walltime
Capacity	1 PB		1.92/3.84 TB (logins 480 GB)
Quotas per `$HOME`/Workspace	100 GB	5 TB (per workspace)	---
Snapshots	daily (7 snapshots) (not yet implemented)	---	---
Backups	There is NO storage backup!

  global             : all nodes access the same file system
  local              : each node has its own file system
  permanent          : files are stored permanently
                       however, if an account has lost access,
                       the remaining data will be deleted after 6 months
  batch job walltime : files are removed at end of the batch job

Listing File System Quotas

You can list the quotas for your users using nemoquota or df <path_to_directory>.

Examples:

$ nemoquota
HOME / WORKSPACE          Used%       Used       Free      Quota
HOME                         8%       7.5G        93G       101G
conda                        2%        59G       5.0T       5.0T
container                    6%       252G       4.8T       5.0T

$ df --si $HOME
Filesystem        Size  Used Avail Use% Mounted on
10.14.101.1/home  101G  7.5G   93G   8% /home

$ df --si $(ws_find conda)
Filesystem        Size  Used Avail Use% Mounted on
10.14.101.1/work  5.0T   59G  5.0T   2% /work

@@ Line 31: / Line 31: @@
 For researchers from the scientific fields '''N'''euroscience, '''E'''lementary Particle Physics, '''M'''icrosystems Engineering and '''M'''aterials Science the bwForCluster '''NEMO''' offers about 240 compute nodes plus several special purpose nodes for login, interactive jobs, visualization, machine learning and ai.
-Node specification:
+Node specification (see [[NEMO2/Slurm]] for slurm partitons):
 {| class="wikitable" style="text-align:center;"
 |-
 ! style="width:15%"|
-! style="width:12%"| Genoa Partition
+! style="width:15%"| Genoa Partition
-! style="width:12%"| Milan Partition
+! style="width:15%"| Milan Partition
-! style="width:12%"| L40S Partition
+! style="width:15%"| L40S Partition
-! style="width:12%"| MI300A Partition
+! style="width:15%"| MI300A Partition
-! style="width:12%"| Login Nodes
+! style="width:15%"| H200 Partition (August 2025)
+! style="width:15%"| Login Partition
 |-
 !scope="column"| Quantity
@@ Line 46: / Line 47: @@
 | 9
 | 4
+| 2
 | 2
 |-
@@ Line 54: / Line 56: @@
 x [https://www.nvidia.com/en-us/data-center/l40s/ NVIDIA L40S]
 | 4x [https://www.amd.com/en/products/accelerators/instinct/mi300/mi300a.html AMD Instinct MI300A]
+| 2x [https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9654.html AMD EPYC 9654 (Genoa)]
+x [https://www.nvidia.com/en-us/data-center/h200/ NVIDIA H200]
 | 1x [https://www.amd.com/en/products/processors/server/epyc/4th-generation-9004-and-8004-series/amd-epyc-9354.html AMD EPYC 9354 (Genoa)]
 |-
@@ Line 63: / Line 67: @@
 | -/3.7
 .3 (FP64) / 122.6 (FP32) / 1960 (INT8)
+| 2.4/3.55
+(FP64) / 67 (FP32) / 3958 (INT8)
 | 3.25/3.75
 |-
@@ Line 74: / Line 80: @@
 | 4x 24
 '''Usable in Slurm: 92'''
-| 32
+| 192
+'''Usable in Slurm: 190'''
+| 32 (SMT on)
+'''Usable for Users: 4'''
+|-
+!scope="column" | CPU Cores per APU/GPU
+| ---
+| ---
+| 15
+| 23
+| 23
+| ---
 |-
 !scope="column" | Memory
 | 768 GiB, 4.8 GHz (DDR5)
+'''Usable in Slurm: 727 GiB, 745000 MiB, per Core: 3900 MiB'''
 | 512 GiB, 3.2 GHZ (DDR4)
+'''Usable in Slurm: 495 GiB, 507000 MiB, per Core: 4000 MiB'''
 | 512 GiB, 5.6 GHz (DDR5)
 x 48 GB, 864 GB/s (GDDR6)
+'''Usable in Slurm: 495 GiB, 507000 MiB, per Core: 8100 MiB'''
 | 4x 128 GB, 5300 GB/s (HBM3)
+'''Usable in Slurm: 495 GiB, 507000 MiB, per Core: 5300 MiB'''
+| 1536 GiB, 4.8 GHz (DDR5)
+x 141 GB, 4813 GB/s (HBM3e)
+'''Usable in Slurm: 1459 GiB, 1495000 MiB, per Core:7800 MiB'''
 | 384 GiB, 4.8 GHz (DDR5)
+'''Usable for Users: 50 GiB, 51200 MiB, per Core: 12800 MiB'''
 |-
 !scope="column" | Local NVMe (GB)
 | 3840
 | 1920
+| 3840
 | 3840
 | 3840
@@ Line 95: / Line 121: @@
 | Omni-Path 100
 GbE (RoCEv2)
+| 100 GbE (RoCEv2)
 | 100 GbE (RoCEv2)
 | 100 GbE (RoCEv2)
@@ Line 100: / Line 127: @@
 |-
 !scope="column" | Job Example Genoa Partition
-|colspan="5" | Maximum resources for a single node job (*):
+|colspan="6" | Maximum resources for a single node job (*):
 <code>--partition=genoa --ntasks=190 --mem=727GB # or: --mem=745000MB, or: --mem-per-cpu=3900MB</code>
 |-
 !scope="column" | Job Example Milan Partition
-|colspan="5" | Maximum resources for a single node job (*):
+|colspan="6" | Maximum resources for a single node job (*):
 <code>--partition=milan --ntasks=126 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=4000MB</code>
 |-
 !scope="column" | Job Example L40S Partition
-|colspan="5" | Maximum resources for a single node job (*):
+|colspan="6" | Maximum resources for a single node job (*):
-<code>--partition=l40s --ntasks=62 --gres=gpu:4 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=4000MB</code> (week 20)
+<code>--partition=l40s --ntasks=62 --gres=gpu:4 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=8100MB</code>
 |-
 !scope="column" | Job Example MI300A Partition
-|colspan="5" | Maximum resources for a single node job (*):
+|colspan="6" | Maximum resources for a single node job (*):
+<code>--partition=mi300a --ntasks=94 --gres=gpu:4 --mem=495GB # or: --mem=507000MB, or: --mem-per-cpu=5300MB</code>
-(not yet available)
+|-
+!scope="column" | Job Example H200 Partition
+|colspan="6" | Maximum resources for a single node job (*):
+<code>--partition=h200 --ntasks=190 --gres=gpu:8 --mem=1459GB # or: --mem=1495000MB, or: --mem-per-cpu=7800MB</code>
+|-
+!scope="column" | Job Example Login Partition
+|colspan="6" | Maximum resources for a single node job (*):
+<code>--partition=login --ntasks=4 --time=15:00 --mem=50GB + or: --mem=51200MB, or: --mem-per-cpu=12800MB</code>
 |-
 |}
@@ Line 144: / Line 179: @@
 !scope="column" | Capacity
 |colspan="2" style="text-align:center;"| 1 PB
-| 1.9 TB or more (depends on node)
+| 1.92/3.84 TB (logins 480 GB)
 |-
 !scope="column" | Quotas per <tt>$HOME</tt>/Workspace
@@ Line 168: / Line 203: @@
                        the remaining data will be deleted after 6 months
   batch job walltime : files are removed at end of the batch job
+</pre>
+=== Listing File System Quotas ===
+You can list the quotas for your users using <code>nemoquota</code> or <code>df <path_to_directory></code>.
+Examples:
+<pre>
+$ nemoquota
+HOME / WORKSPACE          Used%       Used       Free      Quota
+HOME                         8%       7.5G        93G       101G
+conda                        2%        59G       5.0T       5.0T
+container                    6%       252G       4.8T       5.0T
+$ df --si $HOME
+Filesystem        Size  Used Avail Use% Mounted on
+.14.101.1/home  101G  7.5G   93G   8% /home
+$ df --si $(ws_find conda)
+Filesystem        Size  Used Avail Use% Mounted on
+.14.101.1/work  5.0T   59G  5.0T   2% /work
 </pre>

NEMO2/Hardware: Difference between revisions

Latest revision as of 17:33, 15 September 2025

Operating System and Software

Contents

Compute and Special Purpose Nodes

File Systems

Listing File System Quotas

Navigation menu

NEMO2/Hardware: Difference between revisions

Latest revision as of 17:33, 15 September 2025

Operating System and Software

Compute and Special Purpose Nodes

File Systems

Listing File System Quotas

Navigation menu

Search