Data Transfer: Difference between revisions
H Schumacher (talk | contribs) m (new recommendations) |
|||
(12 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
== |
== Overview == |
||
Data transfer is the exchange of files between two systems. Before data transfer can happen, you need to go through the following steps: |
|||
{|class="wikitable" |
|||
|- |
|||
! rowspan="2" | Type |
|||
! rowspan="2" | Software |
|||
! rowspan="2" | Remarks |
|||
! colspan="4" style="text-align:center" | Executable on |
|||
! colspan="3" style="text-align:center" | Transfer from/to |
|||
|- |
|||
!Local° |
|||
!bwUniCluster |
|||
!bwForCluster |
|||
!www |
|||
!bwHPC cluster |
|||
![[SDS@hd]] |
|||
|- |
|||
| rowspan="5" | Command-line |
|||
! scp |
|||
| rowspan="3" | Throughput < 150 MB/s (depending on cipher) |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | |
|||
|- |
|||
! sftp |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
! rsync |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | |
|||
|- |
|||
! rdata |
|||
| Throughput of 350-400 MB/s |
|||
| |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
|- |
|||
! wget |
|||
| Download from http/ftp address only |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | |
|||
|- |
|||
| rowspan="2" | Graphical |
|||
! [https://winscp.net/eng/download.php WinSCP] |
|||
| based on SCP/SFTP, Windows only |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
! [https://filezilla-project.org/download.php?show_all=1 FileZilla] |
|||
| based on SFTP |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
|} |
|||
# Choose the two [[#Data_Storage_Systems|data storage systems]] that shall exchange data. |
|||
° Depending on the installed operating system (OS). |
|||
# Choose the top level ways of transfer ([[#Ways of Transfer:_Copy,_Sync,_Mount|copy, sync or mount]]) by considering your specific use case. |
|||
# Choose a [[#Network_Protocols_&_Transfer_Tools|network protocol or transfer tool]] to use for the communication between the systems. |
|||
The recommended setup already includes these three steps. For a full overview, you can reference the tables that show [[Data_Transfer/All_Data_Transfer_Routes|all transfer routes]]. Those include all possible combinations between systems, top level way of transfer and network protocol / transfer tool. |
|||
== Linux/Unix/Mac commandline sftp/scp Usage Examples == |
|||
=== |
=== Data Storage Systems === |
||
<pre> |
|||
> sftp ka_xy1234@bwfilestorage.lsdf.kit.edu |
|||
Connecting to bwfilestorage.lsdf.kit.edu<br> |
|||
ka_xy1234@bwfilestorage.lsdf.kit.edu's password: |
|||
sftp> ls |
|||
snapshots |
|||
temp test |
|||
sftp> help |
|||
... |
|||
sftp> put myfile |
|||
sftp> get myfile |
|||
</pre> |
|||
Data transfer can happen between a variety of systems. For example: |
|||
=== scp === |
|||
<code> |
|||
> scp mylocalfile ul_xy1234@justus2.uni-ulm.de: # copies to home directory |
|||
</code> |
|||
* [[File:Notebook.svg|x20px]] Local computer or VM (virtual machine) |
|||
== Using SFTP from Windows and Mac graphical clients == |
|||
* [[File:Microscope.svg|x20px]] <span style="margin-left:10px;">Data producing machine (sequencer, microscope, ...)<span> |
|||
* [[File:Clusternodes.svg|x20px]] <span style="margin-left:8px;">HPC system<span> |
|||
* [[File:Storage_small.svg|x15px]] <span style="margin-left:8px;">Storage space (SDS@hd, institute server, ...)<span> |
|||
* [[File:Cloud.svg|x15px]] <span style="margin-left:3px;">Cloud resource<span> |
|||
=== Ways of Transfer: Copy, Sync, Mount === |
|||
Windows clients do not have a SCP/SFTP client installed by default, so it needs to be installed before this protocol can be used. |
|||
The top level ways of transfer are: |
|||
'''Tools:''' |
|||
*[https://winscp.net/eng/download.php WinSCP] (for Windows) |
|||
*[https://filezilla-project.org/download.php?show_all=1 FileZilla] (for Windows, Mac and Linux) |
|||
<br> |
|||
'''network drive over SFTP:''' |
|||
*[https://www.southrivertechnologies.com/download/downloadwd.html WebDrive] (for Windows and Mac) |
|||
*[https://www.eldos.com/sftp-net-drive/comparison.php SFTP Net Drive (ELDOS)] (for Windows) |
|||
*[https://www.netdrive.net/ NetDrive] (for Windows) |
|||
*[https://www.expandrive.com/expandrive ExpanDrive] (for Windows and Mac) |
|||
* '''Copy:''' A simple copy command is the most basic way to transfer data. This is most efficient for very big data files that shall be retrieved from or moved to a remote location. And it can be most convenient, if you prefer moving your files via commandline instead of using a file browser. <br>Examples: [[Data_Transfer/SCP|scp]], [[Data_Transfer/SFTP|sftp]] |
|||
=== Filezilla === |
|||
* '''Sync:''' If the data is intended to be kept on both systems and undergoes change on only one of the systems, it makes sense to use a synchronization command instead. This way, only the changed files in one location are updated in the other location. Good use cases are backups or data transfers that go mostly in one direction like moving data from a sequencer to a storage space. A disadvantage is that the data needs storage space on both systems. <br>Example: [[Data_Transfer/Rsync|rsync]] |
|||
* '''Mount:''' If the data undergoes change on both systems or is too big to store locally, then mounting is the most convenient solution. This allows you to see and work with the data as if it were stored locally on your computer while it is still placed on the remote system. All changes that you implement happen directly on the original data so that you don't need to copy or synchronize anything. Additionally, you'll see all changes that another party does to the data with just a very short delay. Disadvantages are that you need defined edit sessions starting with a mount and ending with a clean unmount of your files and with a stable network connection during the session. Also, file operations on the remote system become much slower via a remotely sshfs-mounted system. <br>Example: [[Data_Transfer/SSHFS|sshfs]] |
|||
<p style="text-align: left;">[[File:CopySyncMount.png|x250px]]</p> |
|||
<p style="text-align: left; font-size: small; margin-top: 10px; margin-left: 255px;">Figure 1: Top level transfer routes</p> |
|||
</br> |
|||
Start FileZilla, Select "File -> Site Manager..." from the main menu and set up a new connection with the following settings: |
|||
=== Network Protocols & Transfer Tools === |
|||
<pre> |
|||
Protocol: SFTP - SSH File Transfer Protocol |
|||
Host: <hostname> |
|||
Logon Typ: Interactive |
|||
User: <username> |
|||
</pre> |
|||
{| class="wikitable" style="vertical-align:middle;" |
|||
'''Note:''' By default Filezilla will close the connection after 20 seconds of inactivity. In order to increase or disable this timeout, select "Edit -> Settings ... -> Connections" and increase "Timeout in seconds" to a reasonable value or set to 0 to disable connection timeout. |
|||
|- style="font-weight:bold;" |
|||
! Basic Network Protocol |
|||
== Best practices == |
|||
! Used By Network Protocol |
|||
=== Ciphers === |
|||
Encrypting all the transferred data via scp/sftp takes time, which can become significant for really large data transfers. |
|||
In these cases, you can choose a faster encryption cipher to speed up that part of your data transfer via options to ssh/sftp. |
|||
In our tests, these ciphers have had the listed transfer speedups over the default. If speedups are noticeable for you depends on processor type, network connection and the used hard disk. |
|||
{| class="wikitable" |
|||
!Cipher |
|||
!style="text-align:left;"| performance |
|||
|- |
|- |
||
| ssh |
|||
|chacha20-poly1305@openssh.com (default) |
|||
| [[Data_Transfer/SCP | scp]], [[Data_Transfer/SFTP | sftp]], [[Data_Transfer/Rsync | rsync]] |
|||
| 100% |
|||
|- |
|- |
||
| http(s) |
|||
|aes128-gcm@openssh.com |
|||
| [[Data_Transfer/WebDAV | WebDAV]] |
|||
|~200% |
|||
|- |
|- |
||
| [[SDS@hd/Access/SMB | smb]] |
|||
|aes128-ctr |
|||
| - |
|||
|~188% |
|||
|- |
|- |
||
| [[SDS@hd/Access/NFS | NFS]] |
|||
| - |
|||
|} |
|} |
||
For every data transfer a network protocol to use for the communication between the systems must be chosen. The basic network protocols and the network protocols that build directly upon those are shown in the table on the right. These protocols can either be used rather directly or through tools that provide the protocol together with additional features. A tool can either mean a command line tool or a tool with a graphical user interface. |
|||
A comprehensive overview of all transfer options (network protocols and tools) can be found on the page [[Data_Transfer/All_Data_Transfer_Routes|all data transfer routes]]. |
|||
== Recommended Setup == |
|||
[[File:Bwhpc diagram simplenobox.jpg|thumb|Main routes for mounting|x150px]] |
|||
When you are working with a development environment that allows remote connections, this is the first choice. Otherwise, the main tools/protocols for transferring data are as follows: |
|||
* <u>Windows</u>: '''[[Data_Transfer/Graphical_Clients#MobaXterm|MobaXterm]]''' is a graphical user interface that allows logging in to the cluster with ssh as well as transferring data via a file browser or using command line tools for the transfer. |
|||
With ssh/sshfs you can use different ciphers with the -c option: |
|||
* <u>MacOS and Linux</u>: |
|||
** '''[[Data_Transfer/SSHFS|sshfs]]''' is the quickest way for mounting under stable connections. |
|||
** '''[[Data_Transfer/Rclone|Rclone]]''' for mount, copy and sync (not usable with 2FA). |
|||
* <u>SDS@hd</u>: See the [[SDS@hd/Access|SDS@hd Access]] page. |
|||
== Best Practices == |
|||
<pre>ssh -c aes128-gcm@openssh.com</pre> |
|||
* '''Strong firewall restrictions'''<br /> -> Use ssh or http(s) based protocols, for example [[Data_Transfer/WebDAV|'''WebDav''']] and [[Data_Transfer/SFTP|sftp]]. For very strict facilities, ssh based protocols might not be allowed. |
|||
A list of available ciphers should be available with the command |
|||
* '''Share data with collaborators...''' |
|||
** ...outside of Baden-Württemberg<br /> -> Use the [[SDS@hd|SDS@hd]] storage. |
|||
** ...that are less comfortable with the command line<br /> -> Let them mount the folder. |
|||
* '''Transfer many small files'''<br /> -> Compress the files to one. |
|||
For advanced topics see [[Data_Transfer/Advanced_Data_Transfer|Advanced Data Transfer]]. |
|||
<pre>ssh -Q cipher</pre> |
Latest revision as of 11:25, 25 March 2025
Overview
Data transfer is the exchange of files between two systems. Before data transfer can happen, you need to go through the following steps:
- Choose the two data storage systems that shall exchange data.
- Choose the top level ways of transfer (copy, sync or mount) by considering your specific use case.
- Choose a network protocol or transfer tool to use for the communication between the systems.
The recommended setup already includes these three steps. For a full overview, you can reference the tables that show all transfer routes. Those include all possible combinations between systems, top level way of transfer and network protocol / transfer tool.
Data Storage Systems
Data transfer can happen between a variety of systems. For example:
Local computer or VM (virtual machine)
Data producing machine (sequencer, microscope, ...)
HPC system
Storage space (SDS@hd, institute server, ...)
Cloud resource
Ways of Transfer: Copy, Sync, Mount
The top level ways of transfer are:
- Copy: A simple copy command is the most basic way to transfer data. This is most efficient for very big data files that shall be retrieved from or moved to a remote location. And it can be most convenient, if you prefer moving your files via commandline instead of using a file browser.
Examples: scp, sftp - Sync: If the data is intended to be kept on both systems and undergoes change on only one of the systems, it makes sense to use a synchronization command instead. This way, only the changed files in one location are updated in the other location. Good use cases are backups or data transfers that go mostly in one direction like moving data from a sequencer to a storage space. A disadvantage is that the data needs storage space on both systems.
Example: rsync - Mount: If the data undergoes change on both systems or is too big to store locally, then mounting is the most convenient solution. This allows you to see and work with the data as if it were stored locally on your computer while it is still placed on the remote system. All changes that you implement happen directly on the original data so that you don't need to copy or synchronize anything. Additionally, you'll see all changes that another party does to the data with just a very short delay. Disadvantages are that you need defined edit sessions starting with a mount and ending with a clean unmount of your files and with a stable network connection during the session. Also, file operations on the remote system become much slower via a remotely sshfs-mounted system.
Example: sshfs
Figure 1: Top level transfer routes
Network Protocols & Transfer Tools
Basic Network Protocol | Used By Network Protocol |
---|---|
ssh | scp, sftp, rsync |
http(s) | WebDAV |
smb | - |
NFS | - |
For every data transfer a network protocol to use for the communication between the systems must be chosen. The basic network protocols and the network protocols that build directly upon those are shown in the table on the right. These protocols can either be used rather directly or through tools that provide the protocol together with additional features. A tool can either mean a command line tool or a tool with a graphical user interface.
A comprehensive overview of all transfer options (network protocols and tools) can be found on the page all data transfer routes.
Recommended Setup
When you are working with a development environment that allows remote connections, this is the first choice. Otherwise, the main tools/protocols for transferring data are as follows:
- Windows: MobaXterm is a graphical user interface that allows logging in to the cluster with ssh as well as transferring data via a file browser or using command line tools for the transfer.
- MacOS and Linux:
- SDS@hd: See the SDS@hd Access page.
Best Practices
- Strong firewall restrictions
-> Use ssh or http(s) based protocols, for example WebDav and sftp. For very strict facilities, ssh based protocols might not be allowed. - Share data with collaborators...
- ...outside of Baden-Württemberg
-> Use the SDS@hd storage. - ...that are less comfortable with the command line
-> Let them mount the folder.
- ...outside of Baden-Württemberg
- Transfer many small files
-> Compress the files to one.
For advanced topics see Advanced Data Transfer.