Data Transfer/Rsync: Difference between revisions
(→Usage) |
No edit summary |
||
Line 1: | Line 1: | ||
__FORCETOC__ |
__FORCETOC__ |
||
Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer the parts of a file |
Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer files that have changed or that were newly created on the source side. Rsync even allows to only synchronize parts of a file that have changed instead of the whole file because it operates on block level instead of file level. |
||
<!-- |
|||
If you want to mount a folder or if you want to access data on a cloud storage system like [[SDS@hd|SDS@hd]] or Nextcloud, you should use [[Data_Transfer/Rclone|Rclone]]. As Rclone is more broadly usable than Rsync, it is the preferred transfer method. If you only want to synchronize your data with a machine that is accessible via ssh (bwHPC clusters) then you can use Rsync for this. |
If you want to mount a folder or if you want to access data on a cloud storage system like [[SDS@hd|SDS@hd]] or Nextcloud, you should use [[Data_Transfer/Rclone|Rclone]]. As Rclone is more broadly usable than Rsync, it is the preferred transfer method. If you only want to synchronize your data with a machine that is accessible via ssh (bwHPC clusters) then you can use Rsync for this. |
||
'''Caution:''' If you work on the data collaboratively, synchronization can lead to merge errors when data was changed by multiple parties. |
'''Caution:''' If you work on the data collaboratively, synchronization can lead to merge errors when data was changed by multiple parties. |
||
--> |
|||
== Usage == |
== Usage == |
||
The parameter <code>-P (--partial/--progress)</code> allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again. <br /> |
The parameter <code>-P (--partial/--progress)</code> allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again. <br /> |
Revision as of 09:46, 31 March 2025
Rsync is a command line tool used for singlethreaded, one-directional synchronization. It allows to only transfer files that have changed or that were newly created on the source side. Rsync even allows to only synchronize parts of a file that have changed instead of the whole file because it operates on block level instead of file level.
Usage
The parameter -P (--partial/--progress)
allows Rsync to work with partially downloaded files. Interrupted Rsync sessions can be restarted where it left off by repeating the used command again.
The parameter setting --rsh=ssh
tells Rsync to use ssh as a remote shell to have a secure connection.
# Execute in your local folder, where the file bigdata.tgz is placed. $ rsync -P --rsh=ssh <username>@<remotehost>:bigdata.tgz ./bigdata.tgz
Best Practices
If Rsync is not found on the remote host:
You can add the Rsync path as additional option:
--rsync-path=/usr/bin/rsync
You can find the path by using which rsync
on the remotehost.
When the connection is slow:
Compress the data with the -z
option to make the transfer faster.