BwUniCluster2.0/Maintenance/2023-03

From bwHPC Wiki
Jump to: navigation, search

The following changes were introduced during the maintenance interval between on 20.03.2023 (Monday) 08:30 and 24.03.2023 (Friday) 15:00.

The host key of the system has not changed. You should not receive any warnings by your SSH client(s), but if there should be a warning or if you want to check that you are connecting to the correct system, you can verify the key hashes using the following list:

Algorithm Hash (SHA256) Hash (MD5)
RSA p6Ion2YKZr5cnzf6L6DS1xGnIwnC1BhLbOEmDdp7FA0 59:2a:67:44:4a:d7:89:6c:c0:0d:74:ba:3c:c4:63:6d
ECDSA k8l1JnfLf1y1Qi55IQmo11+/NZx06Rbze7akT5R7tE8 85:d4:d9:97:e0:f0:43:30:6e:66:8e:d0:b6:9b:85:d1
ED25519 yEe5nJ5hZZ1YbgieWr+phqRZKYbrV7zRe8OR3X03cn0 42:d2:0d:ab:87:48:fc:1d:5d:b3:7c:bf:22:c3:5f:b7

1 Hardware

  • All firmware versions on all components were upgraded.

2 Operating system

  • The operating system was upgraded from RHEL 8.4 EUS to RHEL 8.6 EUS. We recommend to re-compile all applications after the upgrade.
  • The Mellanox OFED InfiniBand Stack was updated.

3 Compilers, Libaries and Runtime Environments

4 Userspace tools

  • pigz and pbzip are not supported anymore. Please use pzstd instead.

5 Software Modules

  • The Lmod module system was upgraded.
  • Old openmpi 4.0 modules were removed. Please use openmpi 4.1.

6 Batch system

  • The Slurm version was upgraded to version 23.02.0.
  • The Slurm partitions have changed:
    • multiple and multiple_il maximum number of nodes is now 80
    • the amount of available nodes in partition single has been increased
    • the amount of available nodes in partition multiple has been decreased

7 Storage

  • Lustre client, BeeGFS client and Spectrum Scale client were updated.

8 Graphics stack

  • The NVIDIA driver was upgraded.

9 Containers

  • Enroot was upgraded.
  • Singularity was replaced with its successor Apptainer. (the command 'singularity' still works)

10 JupyterHub

  • Jupyterhub was upgraded to version 3.1.1
  • python3.9 is now used as the default

11 Resource Limits on Login Nodes

  • After the maintenance the following per-user limits apply (via cgroups):
    • 48 GB phyisical memory
    • 400% CPU cycles (100% equals 1 thread)