Data Transfer/Rsync: Difference between revisions

From bwHPC Wiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
__FORCETOC__
__FORCETOC__
Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer the parts of a file(s) that have changed instead of the whole file(s) because it operates on block level instead of file level.
Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer files that have changed or that were newly created on the source side. Rsync even allows to only synchronize parts of a file that have changed instead of the whole file because it operates on block level instead of file level.


<!--
If you want to mount a folder or if you want to access data on a cloud storage system like [[SDS@hd|SDS@hd]] or Nextcloud, you should use [[Data_Transfer/Rclone|Rclone]]. As Rclone is more broadly usable than Rsync, it is the preferred transfer method. If you only want to synchronize your data with a machine that is accessible via ssh (bwHPC clusters) then you can use Rsync for this.
If you want to mount a folder or if you want to access data on a cloud storage system like [[SDS@hd|SDS@hd]] or Nextcloud, you should use [[Data_Transfer/Rclone|Rclone]]. As Rclone is more broadly usable than Rsync, it is the preferred transfer method. If you only want to synchronize your data with a machine that is accessible via ssh (bwHPC clusters) then you can use Rsync for this.


'''Caution:''' If you work on the data collaboratively, synchronization can lead to merge errors when data was changed by multiple parties.
'''Caution:''' If you work on the data collaboratively, synchronization can lead to merge errors when data was changed by multiple parties.


-->
== Usage ==
== Usage ==
The parameter <code>-P (--partial/--progress)</code> allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again. <br />
The parameter <code>-P (--partial/--progress)</code> allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again. <br />

Revision as of 09:46, 31 March 2025

Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer files that have changed or that were newly created on the source side. Rsync even allows to only synchronize parts of a file that have changed instead of the whole file because it operates on block level instead of file level.

Usage

The parameter -P (--partial/--progress) allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again.
The parameter setting --rsh=ssh tells Rsync to use ssh as a remote shell to have a secure connection.

# Execute in your local folder, where the file bigdata.tgz is placed.
$ rsync -P --rsh=ssh <username>@<remotehost>:bigdata.tgz ./bigdata.tgz

Best Practices

If Rsync is not found on the remote host:
You can add the Rsync path as additional option:
--rsync-path=/usr/bin/rsync
You can find the path by using which rsync on the remotehost.

When the connection is slow:
Compress the data with the -z option to make the transfer faster.