Data Transfer/Rsync

From bwHPC Wiki
Jump to navigation Jump to search

Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer the parts of a file(s) that have changed instead of the whole file(s) because it operates on block level instead of file level.

If you want to mount a folder or if you want to access data on a cloud storage system like SDS@hd or Nextcloud, you should use Rclone. As Rclone is more broadly usable than Rsync, it is the preferred transfer method. If you only want to synchronize your data with a machine that is accessible via ssh (bwHPC clusters) then you can use Rsync for this.

Caution: If you work on the data collaboratively, synchronization can lead to merge errors when data was changed by multiple parties.

Usage

The parameter -P (--partial/--progress) allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again.
The parameter setting --rsh=ssh tells Rsync to use ssh as a remote shell to have a secure connection.

# Execute in your local folder, where the file bigdata.tgz is placed.
rsync -P --rsh=ssh <username>@<remotehost>:bigdata.tgz ./bigdata.tgz

Best Practices

If Rsync is not found on the remote host:
You can add the Rsync path as additional option:
--rsync-path=/usr/bin/rsync
You can find the path by using which rsync on the remotehost.

When the connection is slow:
Compress the data with the -z option to make the transfer faster.