BwHPC BPG Data Management

From bwHPC Wiki
Jump to: navigation, search

1 Local File Systems

In addition to computing capacity, each bwHPC cluster is equipped with a parallel file system. For local data management it is important to differentiate if data is frequently used and persistent or quick access during a job's lifetime is desicive.

For each registered user a $HOME directory is provided in the parallel file system. A regular backup secures the files stored in this directory. But quick access from compute nodes is not possible. For data that is read or written during a job's lifetime additional storage without backup is temporarily placed at the disposal. Since implementation varies between the bwHPC clusters, please visit the sites of bwUniCluster 2.0 or bwForCluster JUSTUS 2 for details.

Directory Characteristics Kind of Data
$HOME with backup, limited, global file system software packages, configuration files, important results
Workspaces, $WORK quick access, limited, temporary, global file system input/output files
$TMPDIR, $TMP local file system, temporarily limited to batch job's lifetime intermediate results

As a rule of thumb: Do not compute in $HOME!

Disk space is a limited resource on all HPC systems. If disk space is not sufficient, external storage services such as SDS@hd should be used.

2 Data Transfer

The transfer of a single large file usually achieves a higher throughput than transferring many files of small size. Therefore, it is recommended to collect files to a compressed archive with tools such as zip, tar, xz or others before transferring them to a target system.

2.1 Transfer Tools

Type Software Remarks Executable on Transfer from/to
Local° bwUniCluster bwForCluster www bwHPC cluster SDS@hd
Command-line tool scp Throughput < 150 MB/s (depending on cipher) + + + +
sftp + + + + +
rsync + + + +
rdata Throughput of 350-400 MB/s + +
wget Download only + + + +
Client WinSCP based on SCP/SFTP, Windows only + + +
FileZilla based on SFTP + + +

° Depending on the installed operating system (OS).

2.2 Hosts

System Host
bwUniCluster 2.0

bwForCluster JUSTUS 2

bwForCluster MLS&WISO Production

bwForCluster NEMO

bwForCluster binAC

2.3 Best practices

2.3.1 Ciphers

ssh/sftp has a lot of useful options. One of the important ones is the used encryption cipher, which can have a significant impact on the transmission speed. The particular effects are difficult to predict, because, among other things, they depend on the client's processor. In tests on Intel hardware (Intel (R) Xeon (R) CPU E5-2620 v4), the following ciphers turned out to be particularly fast:

Cipher performance (default) 100% ~200%
aes128-ctr ~188%

With ssh/sshfs you can use different ciphers with the -c option:

ssh -c

A list of available ciphers should be available with the command

ssh -Q cipher

Attention: Not all encryption methods meet the same security requirements. You have to consider the different performances and security requirements for your individual use-case.