Data Transfer: Difference between revisions
K Siegmund (talk | contribs) |
H Schumacher (talk | contribs) m (Added links to protocols in table) |
||
(25 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
== |
== Overview == |
||
Data transfer is the exchange of files between two systems. Before data transfer can happen, you need to go through the following steps: |
|||
{|{{Table|width=99%}} |
|||
|- |
|||
! rowspan="2" | Type |
|||
! rowspan="2" | Software |
|||
! rowspan="2" | Remarks |
|||
! colspan="4" style="text-align:center" | Executable on |
|||
! colspan="3" style="text-align:center" | Transfer from/to |
|||
|- |
|||
!Local° |
|||
!bwUniCluster |
|||
!bwForCluster |
|||
!www |
|||
!bwHPC cluster |
|||
![[Sds-hd|SDS@hd]] |
|||
|- |
|||
| rowspan="5" | Command-line |
|||
| scp |
|||
| rowspan="3" | Throughput < 150 MB/s (depending on cipher) |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | |
|||
|- |
|||
| sftp |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
| rsync |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | |
|||
|- |
|||
| rdata |
|||
| Throughput of 350-400 MB/s |
|||
| |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
|- |
|||
| wget |
|||
| Download from http/ftp address only |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | |
|||
|- |
|||
| rowspan="2" | Graphical |
|||
| [https://winscp.net/eng/download.php WinSCP] |
|||
| based on SCP/SFTP, Windows only |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
| [https://filezilla-project.org/download.php?show_all=1 FileZilla] |
|||
| based on SFTP |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
|} |
|||
# Choose the two [[#Data_Storage_Systems|data storage systems]] that shall exchange data. |
|||
° Depending on the installed operating system (OS). |
|||
# Choose the top level ways of transfer ([[#Ways of Transfer:_Copy,_Sync,_Mount|copy, sync or mount]]) by considering your specific use case. |
|||
# Choose a [[#Network_Protocols_&_Transfer_Tools|network protocol or transfer tool]] to use for the communication between the systems. |
|||
The recommended setup already includes these three steps. For a full overview, you can reference the tables that show [[Data_Transfer/All_Data_Transfer_Routes|all transfer routes]]. Those include all possible combinations between systems, top level way of transfer and network protocol / transfer tool. |
|||
=== Data Storage Systems === |
|||
== Linux/Unix/Mac commandline sftp/scp Usage Examples == |
|||
Data transfer can happen between a variety of systems. For example: |
|||
=== sftp=== |
|||
<pre> |
|||
> sftp ka_xy1234@bwfilestorage.lsdf.kit.edu |
|||
Connecting to bwfilestorage.lsdf.kit.edu<br> |
|||
ka_xy1234@bwfilestorage.lsdf.kit.edu's password: |
|||
sftp> ls |
|||
snapshots |
|||
temp test |
|||
sftp> help |
|||
... |
|||
sftp> put myfile |
|||
sftp> get myfile |
|||
</pre> |
|||
* [[File:Notebook.svg|x20px]] Local computer or VM (virtual machine) |
|||
=== scp === |
|||
* [[File:Microscope.svg|x20px]] <span style="margin-left:10px;">Data producing machine (sequencer, microscope, ...)<span> |
|||
<code> |
|||
* [[File:Clusternodes.svg|x20px]] <span style="margin-left:8px;">HPC system<span> |
|||
> scp mylocalfile ul_xy1234@justus.uni-ulm.de: # copies to home directory |
|||
* [[File:Storage_small.svg|x15px]] <span style="margin-left:8px;">Storage space (SDS@hd, institute server, ...)<span> |
|||
</code> |
|||
* [[File:Cloud.svg|x15px]] <span style="margin-left:3px;">Cloud resource<span> |
|||
=== Ways of Transfer: Copy, Sync, Mount === |
|||
== Using SFTP from Windows and Mac graphical clients == |
|||
The top level ways of transfer are: |
|||
Windows clients do not have a SCP/SFTP client installed by default, so it needs to be installed before this protocol can be used. |
|||
* '''Copy:''' A simple copy command is the most basic way to transfer data. This is most efficient for very big data files that shall be retrieved from or moved to a remote location. And it can be most convenient, if you prefer moving your files via commandline instead of using a file browser. <br>Examples: [[Data_Transfer/SCP|scp]], [[Data_Transfer/SFTP|sftp]] |
|||
'''Tools for example:''' |
|||
* '''Sync:''' If the data is intended to be kept on both systems and undergoes change on only one of the systems, it makes sense to use a synchronization command instead. This way, only the changed files in one location are updated in the other location. Good use cases are backups or data transfers that go mostly in one direction like moving data from a sequencer to a storage space. A disadvantage is that the data needs storage space on both systems. <br>Example: [[Data_Transfer/Rsync|rsync]] |
|||
* [https://www.openssh.com/ OpenSSH] |
|||
* '''Mount:''' If the data undergoes change on both systems or is too big to store locally, then mounting is the most convenient solution. This allows you to see and work with the data as if it were stored locally on your computer while it is still placed on the remote system. All changes that you implement happen directly on the original data so that you don't need to copy or synchronize anything. Additionally, you'll see all changes that another party does to the data with just a very short delay. Disadvantages are that you need defined edit sessions starting with a mount and ending with a clean unmount of your files and with a stable network connection during the session. Also, file operations on the remote system become much slower via a remotely sshfs-mounted system. <br>Example: [[Data_Transfer/SSHFS|sshfs]] |
|||
*[https://www.chiark.greenend.org.uk/~sgtatham/putty/download.html Putty suite] (for Windows and Unix) |
|||
<p style="text-align: left;">[[File:CopySyncMount.png|x250px]]</p> |
|||
*[https://winscp.net/eng/download.php WinSCP] (for Windows) |
|||
<p style="text-align: left; font-size: small; margin-top: 10px; margin-left: 255px;">Figure 1: Top level transfer routes</p> |
|||
*[https://filezilla-project.org/download.php?show_all=1 FileZilla] (for Windows, Mac and Linux) |
|||
*[https://cygwin.com/install.html Cygwin] (for Windows) |
|||
<br> |
|||
'''network drive over SFTP:''' |
|||
*[https://www.southrivertechnologies.com/download/downloadwd.html WebDrive] (for Windows and Mac) |
|||
*[https://www.eldos.com/sftp-net-drive/comparison.php SFTP Net Drive (ELDOS)] (for Windows) |
|||
*[https://www.netdrive.net/ NetDrive] (for Windows) |
|||
*[https://www.expandrive.com/expandrive ExpanDrive] (for Windows and Mac) |
|||
</br> |
|||
== Best practices == |
|||
=== Network Protocols & Transfer Tools === |
|||
=== Ciphers === |
|||
{| class="wikitable" style="vertical-align:middle;" |
|||
Encrypting all the transferred data via scp/sftp takes time, which can become significant for really large data transfers. |
|||
|- style="font-weight:bold;" |
|||
! Basic Network Protocol |
|||
In these cases, you can choose a faster encryption cipher to speed up that part of your data transfer via options to ssh/sftp. |
|||
! Used By Network Protocol |
|||
In our tests, these ciphers have had the listed transfer speedups over the default. If speedups are noticeable for you depends on processor type, network connection and the used hard disk. |
|||
{| class="wikitable" |
|||
!Cipher |
|||
!style="text-align:left;"| performance |
|||
|- |
|- |
||
| ssh |
|||
|chacha20-poly1305@openssh.com (default) |
|||
| [[Data_Transfer/SCP | scp]], [[Data_Transfer/SFTP | sftp]], [[Data_Transfer/Rsync | rsync]] |
|||
| 100% |
|||
|- |
|- |
||
| http(s) |
|||
|aes128-gcm@openssh.com |
|||
| [[Data_Transfer/WebDAV | WebDAV]] |
|||
|~200% |
|||
|- |
|- |
||
| [[SDS@hd/Access/SMB | smb]] |
|||
|aes128-ctr |
|||
| - |
|||
|~188% |
|||
|- |
|- |
||
| [[SDS@hd/Access/NFS | NFS]] |
|||
| - |
|||
|} |
|} |
||
For every data transfer a network protocol to use for the communication between the systems must be chosen. The basic network protocols and the network protocols that build directly upon those are shown in the table on the right. These protocols can either be used rather directly or through tools that provide the protocol together with additional features. A tool can either mean a command line tool or a tool with a graphical user interface. |
|||
A comprehensive overview of all transfer options (network protocols and tools) can be found on the page [[Data_Transfer/All_Data_Transfer_Routes|all data transfer routes]]. |
|||
== Recommended Setup == |
|||
The main tool/protocol for transferring data to a bwHPC cluster from your local machine is as follows: |
|||
* '''[[Data_Transfer/Graphical_Clients#MobaXterm|MobaXterm]]''' for Windows. It is a graphical user interface that allows logging in to the cluster with ssh as well as transferring data via a file browser. |
|||
* '''[[Data_Transfer/SSHFS|sshfs]]''' for MacOS and Linux. It is the quickest solution for mounting a folder. If you want to use copy and sync as well, it is more convenient to use '''[[Data_Transfer/Rclone|Rclone]]''' instead. Rclone provides copy, sync and mount functionality for various types of infrastructure. |
|||
<p style="text-align: center; margin-top: 10px">[[File:Bwhpc diagram simplenobox.jpg|x150px]]</p> |
|||
With ssh/sshfs you can use different ciphers with the -c option: |
|||
<p style="text-align: center; font-size: small; margin-top: 10px">Figure 2: bwHPC main transfer routes</p> |
|||
For SDS@hd you can find the main access options at the [[SDS@hd/Access|SDS@hd Access]] page. |
|||
== Best Practices == |
|||
<pre>ssh -c aes128-gcm@openssh.com</pre> |
|||
* '''Strong firewall restrictions'''<br /> -> Use ssh or http(s) based protocols, for example [[Data_Transfer/WebDAV|'''WebDav''']] and [[Data_Transfer/SFTP|sftp]]. For very strict facilities, ssh based protocols might not be allowed. |
|||
A list of available ciphers should be available with the command |
|||
* '''Share data with collaborators...''' |
|||
** ...outside of Baden-Württemberg<br /> -> Use the [[SDS@hd|SDS@hd]] storage. |
|||
** ...that are less comfortable with the command line<br /> -> Let them mount the folder. |
|||
* '''Transfer many small files'''<br /> -> Compress the files to one. |
|||
For advanced topics see [[Data_Transfer/Advanced_Data_Transfer|Advanced Data Transfer]]. |
|||
<pre>ssh -Q cipher</pre> |
Latest revision as of 17:37, 12 March 2025
Overview
Data transfer is the exchange of files between two systems. Before data transfer can happen, you need to go through the following steps:
- Choose the two data storage systems that shall exchange data.
- Choose the top level ways of transfer (copy, sync or mount) by considering your specific use case.
- Choose a network protocol or transfer tool to use for the communication between the systems.
The recommended setup already includes these three steps. For a full overview, you can reference the tables that show all transfer routes. Those include all possible combinations between systems, top level way of transfer and network protocol / transfer tool.
Data Storage Systems
Data transfer can happen between a variety of systems. For example:
Local computer or VM (virtual machine)
Data producing machine (sequencer, microscope, ...)
HPC system
Storage space (SDS@hd, institute server, ...)
Cloud resource
Ways of Transfer: Copy, Sync, Mount
The top level ways of transfer are:
- Copy: A simple copy command is the most basic way to transfer data. This is most efficient for very big data files that shall be retrieved from or moved to a remote location. And it can be most convenient, if you prefer moving your files via commandline instead of using a file browser.
Examples: scp, sftp - Sync: If the data is intended to be kept on both systems and undergoes change on only one of the systems, it makes sense to use a synchronization command instead. This way, only the changed files in one location are updated in the other location. Good use cases are backups or data transfers that go mostly in one direction like moving data from a sequencer to a storage space. A disadvantage is that the data needs storage space on both systems.
Example: rsync - Mount: If the data undergoes change on both systems or is too big to store locally, then mounting is the most convenient solution. This allows you to see and work with the data as if it were stored locally on your computer while it is still placed on the remote system. All changes that you implement happen directly on the original data so that you don't need to copy or synchronize anything. Additionally, you'll see all changes that another party does to the data with just a very short delay. Disadvantages are that you need defined edit sessions starting with a mount and ending with a clean unmount of your files and with a stable network connection during the session. Also, file operations on the remote system become much slower via a remotely sshfs-mounted system.
Example: sshfs
Figure 1: Top level transfer routes
Network Protocols & Transfer Tools
Basic Network Protocol | Used By Network Protocol |
---|---|
ssh | scp, sftp, rsync |
http(s) | WebDAV |
smb | - |
NFS | - |
For every data transfer a network protocol to use for the communication between the systems must be chosen. The basic network protocols and the network protocols that build directly upon those are shown in the table on the right. These protocols can either be used rather directly or through tools that provide the protocol together with additional features. A tool can either mean a command line tool or a tool with a graphical user interface.
A comprehensive overview of all transfer options (network protocols and tools) can be found on the page all data transfer routes.
Recommended Setup
The main tool/protocol for transferring data to a bwHPC cluster from your local machine is as follows:
- MobaXterm for Windows. It is a graphical user interface that allows logging in to the cluster with ssh as well as transferring data via a file browser.
- sshfs for MacOS and Linux. It is the quickest solution for mounting a folder. If you want to use copy and sync as well, it is more convenient to use Rclone instead. Rclone provides copy, sync and mount functionality for various types of infrastructure.
Figure 2: bwHPC main transfer routes
For SDS@hd you can find the main access options at the SDS@hd Access page.
Best Practices
- Strong firewall restrictions
-> Use ssh or http(s) based protocols, for example WebDav and sftp. For very strict facilities, ssh based protocols might not be allowed. - Share data with collaborators...
- ...outside of Baden-Württemberg
-> Use the SDS@hd storage. - ...that are less comfortable with the command line
-> Let them mount the folder.
- ...outside of Baden-Württemberg
- Transfer many small files
-> Compress the files to one.
For advanced topics see Advanced Data Transfer.