BwUniCluster2.0/File System Migration Guide

From bwHPC Wiki
< BwUniCluster2.0
Revision as of 23:19, 7 October 2022 by R Barthel (talk | contribs) (R Barthel moved page BwUniCluster 2.0 File System Migration Guide to BwUniCluster2.0/File System Migration Guide)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

This page describes changes on the file systems between bwUniCluster 1 and bwUniCluster 2.0.

Lustre file systems

With bwUniCluster 2.0 new file systems for $HOME and workspaces with a total capacity of 5 PB and aggregate throughput of 72 GB/s have been purchased. The user data of the corresponding old file systems has been copied by KIT/SCC to these new file systems. This means that you should see the same data on $HOME and on workspaces as on bwUniCluster 1. The content of $WORK has to be migrated manually by the users to workspaces during the next 4 weeks, for details see below.

Changes on $HOME and workspace directories

You should see the same data as before but if this is not true - which is possible in rare cases since the data of expired users has been cleaned up - please contact the hotline or open a ticket. The virtual path to $HOME (/home/ORG/GROUP/USER) is still the same (except for University of Stuttgart users, see below) but the physical path (which can be seen by /bin/pwd) has changed. If one of your scripts includes a physical path (which is not recommended) you should modify these scripts. If applications complain about the old physical path (which started with /pfs/data1/ or /pfs/data2/) you should modify or recompile the application.

New user quota limits for $HOME

There are new user quota limits of 1 TiB and 10 million files on $HOME. Users which have currently more data have to reduce their data during the next 4 weeks. You can show your current usage and limits with the following command:

$ lfs quota -uh $(whoami) $HOME

Note that in addition to the user limit there is a limit of your organization which depends on the financial share. This limit existed before, too, but it is now enforced with so-called Lustre project quotas and before Lustre group quotas were used. You can show the current usage and limits of your organization with the following command:

lfs quota -ph $(grep $(echo $HOME | sed -e "s|/[^/]*/[^/]*$||") /pfs/data5/project_ids.txt | cut -f 1 -d\ ) $HOME

With the new quota limits there is no longer a need to check the quotas of other users of the main group. Hence the diskusage files will no longer be created.

New user quota limits for the workspace file system

There are new user quota limits of 40 TiB and 30 million files on the workspace file system. Users which have currently more data have to reduce their data during the next 4 weeks. The limits include released and expired workspaces which are internally deleted after few weeks. You can show your current usage and limits with the following command:

$ lfs quota -uh $(whoami) /pfs/work7

If it is really needed users can request higher limits by opening a ticket or sending an email to the hotline. However, they have to clearly describe why higher limits are needed, how long these extended limits are required, and why other storage opportunities cannot be used.

New stripe count settings for the workspace file system

For the workspace file system directories the new Lustre feature Progressive File Layouts has been used to define file striping parameters. This means that the stripe count is adapted if the file size is growing. In normal cases users no longer need to adapt file striping parameters in case they have very huge files or in order to reach better performance.

New group membership for users of University of Stuttgart

For users of University of Stuttgart the group membership has usually changed and the owning group for all data below $HOME and the old $WORK has been changed accordingly. This means that the virtual path to $HOME (/home/ORG/GROUP/USER) has changed, too. However, if you use environment variables (for example $HOME) in your scripts (which is recommended) nothing needs to be done. Otherwise you have to change your scripts accordingly.

Hints for migration of $WORK data

Workspaces should be used instead of $WORK. Advantages of workspaces are a clear deletion policy and the ability to restore data from expired workspaces for few weeks. The old $WORK will be mounted in read-only mode for 4 weeks. If data on $WORK is still needed it has to be migrated by the users. Example for the migration to a workspace:

# create a new workspace
$ ws_allocate newwork 60
# define environment variable for old WORK
OLDWORK=$WORK
# define environment variable for workspace
WORK=$(ws_find newwork)
# Copy the data with rsync
rsync -a -A $OLDWORK/ $WORK/

Of course, if only a part of your old $WORK is needed you should only copy that part. As shown above you can redefine the WORK environment variable. For example, such a definition could be added to old scripts and then no further change on the script should be required.

The environment variable $WORK will exist and is set after login while the old $WORK file system is mounted during the next 4 weeks. Afterwards it will no longer exist or point to an empty string.

LSDF Online Storage

The LSDF Online Storage is now directly mounted on the Login- and Datamover-Nodes. Furthermore it can be used on the compute nodes during the job runtime with the constraint flag "LSDF" of the slurm batch system.

$TMP

All nodes of bwUniCluster 2.0 now have fast SSDs as fast local storage devices. These are especially useful if you have lots of small files or if you are doing random I/O operations (huge IOPS numbers) and if the data is only needed on the local node during the job runtime.

BeeOND (BeeGFS On-Demand)

Users now have possibility to request a private BeeOND (BeeGFS) parallel filesystem for each job with the constraint flag "BEEOND" of the slurm batch system. The file system is created during job startup and purged after your job. This is especially useful if you are doing huge amounts of I/O and/or if your I/O is not very efficient since there is no impact on the global Lustre file systems. However, you have to copy input data to this file system after job start and you have to copy data to a globally visible file system before the job completes.