BwHPC BPG Data Management
1 Local File Systems
In addition to computing capacity, each bwHPC cluster is equipped with a parallel file system. For local data management it is important to differentiate if data is frequently used and persistent or quick access during a job's lifetime is desicive.
For each registered user a $HOME directory is provided in the parallel file system. A regular backup secures the files stored in this directory. But quick access from compute nodes is not possible. For data that is read or written during a job's lifetime additional storage without backup is temporarily placed at the disposal. Since implementation varies between the bwHPC clusters, please visit the sites of bwUniCluster 2.0 or bwForCluster JUSTUS 2 for details.
|Directory||Characteristics||Kind of Data|
|$HOME||with backup, limited, global file system||software packages, configuration files, important results|
|Workspaces, $WORK||quick access, limited, temporary, global file system||input/output files|
|$TMPDIR, $TMP||local file system, temporarily limited to batch job's lifetime||intermediate results|
As a rule of thumb: Do not compute in $HOME!
Disk space is a limited resource on all HPC systems. If disk space is not sufficient, external storage services such as SDS@hd should be used.
2 Data Transfer
The transfer of a single large file usually achieves a higher throughput than transferring many files of small size. Therefore, it is recommended to collect files to a compressed archive with tools such as zip, tar, xz or others before transferring them to a target system.
2.1 Transfer Tools
|Type||Software||Remarks||Executable on||Transfer from/to|
|Command-line tool||scp||Throughput < 150 MB/s (depending on cipher)||+||+||+||+|
|rdata||Throughput of 350-400 MB/s||+||+|
|Client||WinSCP||based on SCP/SFTP, Windows only||+||+||+|
|FileZilla||based on SFTP||+||+||+|
° Depending on the installed operating system (OS).
|bwForCluster JUSTUS 2||
|bwForCluster MLS&WISO Production||
2.3 Best practices
ssh/sftp has a lot of useful options. One of the important ones is the used encryption cipher, which can have a significant impact on the transmission speed. The particular effects are difficult to predict, because, among other things, they depend on the client's processor. In tests on Intel hardware (Intel (R) Xeon (R) CPU E5-2620 v4), the following ciphers turned out to be particularly fast:
With ssh/sshfs you can use different ciphers with the -c option:
ssh -c email@example.com
A list of available ciphers should be available with the command
ssh -Q cipher
Attention: Not all encryption methods meet the same security requirements. You have to consider the different performances and security requirements for your individual use-case.