Data Transfer/Rsync
Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer the parts of a file(s) that have changed instead of the whole file(s) because it operates on block level instead of file level.
If you want to mount a folder or if you want to access data on a cloud storage system like SDS@hd or Nextcloud, you should use Rclone. As Rclone is more broadly usable than Rsync, it is the preferred transfer method. If you only want to synchronize your data with a machine that is accessible via ssh (bwHPC clusters) then you can use Rsync for this.
Caution: If you work on the data collaboratively, synchronization can lead to merge errors when data was changed by multiple parties.
Usage
The parameter -P (--partial/--progress)
allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again.
The parameter setting --rsh=ssh
tells Rsync to use ssh as a remote shell to have a secure connection.
# Execute in your local folder, where the file bigdata.tgz is placed. rsync -P --rsh=ssh <username>@<remotehost>:bigdata.tgz ./bigdata.tgz
Best Practices
If Rsync is not found on the remote host:
You can add the Rsync path as additional option:
--rsync-path=/usr/bin/rsync
You can find the path by using which rsync
on the remotehost.
When the connection is slow:
Compress the data with the -z
option to make the transfer faster.