Data Transfer: Difference between revisions
H Schumacher (talk | contribs) m (formatting) |
|||
(One intermediate revision by the same user not shown) | |||
Line 1: | Line 1: | ||
{| style=" background:#FEF4AB; width:100%;" |
|||
== Transfer Tools == |
|||
| style="padding:6px; font-size:100%;text-align:left" | This page is work in progress. To discover more relevant subpages, please come back at a later timepoint. |
|||
{|class="wikitable" |
|||
|- |
|||
! rowspan="2" | Type |
|||
! rowspan="2" | Software |
|||
! rowspan="2" | Remarks |
|||
! colspan="4" style="text-align:center" | Executable on |
|||
! colspan="3" style="text-align:center" | Transfer from/to |
|||
|- |
|||
!Local° |
|||
!bwUniCluster |
|||
!bwForCluster |
|||
!www |
|||
!bwHPC cluster |
|||
![[SDS@hd]] |
|||
|- |
|||
| rowspan="5" | Command-line |
|||
! scp |
|||
| rowspan="3" | Throughput < 150 MB/s (depending on cipher) |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | |
|||
|- |
|||
! sftp |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
! rsync |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | |
|||
|- |
|||
! rdata |
|||
| Throughput of 350-400 MB/s |
|||
| |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
|- |
|||
! wget |
|||
| Download from http/ftp address only |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
| |
|||
| style="text-align:center" | |
|||
|- |
|||
| rowspan="2" | Graphical |
|||
! [https://winscp.net/eng/download.php WinSCP] |
|||
| based on SCP/SFTP, Windows only |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|||
! [https://filezilla-project.org/download.php?show_all=1 FileZilla] |
|||
| based on SFTP |
|||
| style="text-align:center" | + |
|||
| |
|||
| |
|||
| |
|||
| style="text-align:center" | + |
|||
| style="text-align:center" | + |
|||
|- |
|- |
||
|} |
|} |
||
== Overview == |
|||
Data transfer is the exchange of files between two systems. Before data transfer can happen, you need to go through the following steps: |
|||
° Depending on the installed operating system (OS). |
|||
# Choose the two [[#Data_Storage_Systems|data storage systems]] that shall exchange data. |
|||
== Linux/Unix/Mac commandline sftp/scp Usage Examples == |
|||
# Choose the top level ways of transfer ([[#Ways of Transfer:_Copy,_Sync,_Mount|copy, sync or mount]]) by considering your specific use case. |
|||
# Choose a [[#Network_Protocols_&_Transfer_Tools|network protocol or transfer tool]] to use for the communication between the systems. |
|||
The recommended setup already includes these three steps. For a full overview, you can reference the tables that show [[All_Transfer_Routes|all transfer routes]]. Those include all possible combinations between systems, top level way of transfer and network protocol / transfer tool. |
|||
=== sftp=== |
|||
<pre> |
|||
> sftp ka_xy1234@bwfilestorage.lsdf.kit.edu |
|||
Connecting to bwfilestorage.lsdf.kit.edu<br> |
|||
ka_xy1234@bwfilestorage.lsdf.kit.edu's password: |
|||
sftp> ls |
|||
snapshots |
|||
temp test |
|||
sftp> help |
|||
... |
|||
sftp> put myfile |
|||
sftp> get myfile |
|||
</pre> |
|||
=== |
=== Data Storage Systems === |
||
<code> |
|||
> scp mylocalfile ul_xy1234@justus2.uni-ulm.de: # copies to home directory |
|||
</code> |
|||
Data transfer can happen between a variety of systems. For example: |
|||
== Using SFTP from Windows and Mac graphical clients == |
|||
* [[File:Notebook.svg|x20px]] Local computer or VM (virtual machine) |
|||
Windows clients do not have a SCP/SFTP client installed by default, so it needs to be installed before this protocol can be used. |
|||
* [[File:Microscope.svg|x20px]] <span style="margin-left:10px;">Data producing machine (sequencer, microscope, ...)<span> |
|||
* [[File:Clusternodes.svg|x20px]] <span style="margin-left:8px;">HPC system<span> |
|||
* [[File:Storage_small.svg|x15px]] <span style="margin-left:8px;">Storage space (SDS@hd, institute server, ...)<span> |
|||
* [[File:Cloud.svg|x15px]] <span style="margin-left:3px;">Cloud resource<span> |
|||
=== Ways of Transfer: Copy, Sync, Mount === |
|||
'''Tools:''' |
|||
*[https://winscp.net/eng/download.php WinSCP] (for Windows) |
|||
*[https://filezilla-project.org/download.php?show_all=1 FileZilla] (for Windows, Mac and Linux) |
|||
<br> |
|||
'''network drive over SFTP:''' |
|||
*[https://www.southrivertechnologies.com/download/downloadwd.html WebDrive] (for Windows and Mac) |
|||
*[https://www.eldos.com/sftp-net-drive/comparison.php SFTP Net Drive (ELDOS)] (for Windows) |
|||
*[https://www.netdrive.net/ NetDrive] (for Windows) |
|||
*[https://www.expandrive.com/expandrive ExpanDrive] (for Windows and Mac) |
|||
The top level ways of transfer are: |
|||
=== Filezilla === |
|||
* '''Copy:''' A simple copy command is the most basic way to transfer data. This is most efficient for very big data files that shall be retrieved from or moved to a remote location. And it can be most convenient, if you prefer moving your files via commandline instead of using a file browser. |
|||
Start FileZilla, Select "File -> Site Manager..." from the main menu and set up a new connection with the following settings: |
|||
* '''Sync:''' If the data is intended to be kept on both systems and undergoes change on only one of the systems, it makes sense to use a synchronization command instead. This way, only the changed files in one location are updated in the other location. Good use cases are backups or data transfers that go mostly in one direction like moving data from a sequencer to a storage space. A disadvantage is that the data needs storage space on both systems. |
|||
* '''Mount:''' If the data undergoes change on both systems or is too big to store locally, then mounting is the most convenient solution. This allows you to see and work with the data as if it were stored locally on your computer while it is still placed on the remote system. All changes that you implement happen directly on the original data so that you don't need to copy or synchronize anything. Additionally, you'll see all changes that another party does to the data with just a very short delay. |
|||
<p style="text-align: left;">[[File:CopySyncMount.png|x250px]]</p> |
|||
<p style="text-align: left; font-size: small; margin-top: 10px; margin-left: 255px;">Figure 1: Top level transfer routes</p> |
|||
< |
</br> |
||
Protocol: SFTP - SSH File Transfer Protocol |
|||
Host: <hostname> |
|||
Logon Typ: Interactive |
|||
User: <username> |
|||
</pre> |
|||
=== Network Protocols & Transfer Tools === |
|||
'''Note:''' By default Filezilla will close the connection after 20 seconds of inactivity. In order to increase or disable this timeout, select "Edit -> Settings ... -> Connections" and increase "Timeout in seconds" to a reasonable value or set to 0 to disable connection timeout. |
|||
{| class="wikitable" style="vertical-align:middle;" |
|||
== Best practices == |
|||
|- style="font-weight:bold;" |
|||
! Basic Network Protocol |
|||
=== Ciphers === |
|||
! Used By Network Protocol |
|||
Encrypting all the transferred data via scp/sftp takes time, which can become significant for really large data transfers. |
|||
In these cases, you can choose a faster encryption cipher to speed up that part of your data transfer via options to ssh/sftp. |
|||
In our tests, these ciphers have had the listed transfer speedups over the default. If speedups are noticeable for you depends on processor type, network connection and the used hard disk. |
|||
{| class="wikitable" |
|||
!Cipher |
|||
!style="text-align:left;"| performance |
|||
|- |
|- |
||
| ssh |
|||
|chacha20-poly1305@openssh.com (default) |
|||
| scp, sftp, rsync |
|||
| 100% |
|||
|- |
|- |
||
| http(s) |
|||
|aes128-gcm@openssh.com |
|||
| WebDAV |
|||
|~200% |
|||
|- |
|- |
||
| smb |
|||
|aes128-ctr |
|||
| - |
|||
|~188% |
|||
|- |
|- |
||
| NFS |
|||
| - |
|||
|} |
|} |
||
For every data transfer a network protocol to use for the communication between the systems must be chosen. The basic network protocols and the network protocols that build directly upon those are shown in the table on the right. These protocols can either be used rather directly or through tools that provide the protocol together with additional features. A tool can either mean a command line tool or a tool with a graphical user interface. |
|||
A comprehensive overview of all transfer options (network protocols and tools) can be found on the page [[Data_Transfer/All_Transfer_Routes|all transfer routes]]. |
|||
== Recommended Setup == |
|||
The main tool/protocol for transferring data to a bwHPC cluster from your local machine is as follows: |
|||
* '''[[Data_Transfer/MobaXterm|MobaXterm]]''' for Windows. It is a graphical user interface that allows logging in to the cluster with ssh as well as transferring data via a file browser. |
|||
* '''[[Data_Transfer/SSHFS|sshfs]]''' for MacOS and Linux. It is the quickest solution for mounting a folder. If you want to use copy and sync as well, it is more convenient to use '''[[Data_Transfer/Rclone|Rclone]]''' instead. Rclone provides copy, sync and mount functionality for various types of infrastructure. |
|||
<p style="text-align: center; margin-top: 10px">[[File:Bwhpc diagram simplenobox.jpg|x150px]]</p> |
|||
With ssh/sshfs you can use different ciphers with the -c option: |
|||
<p style="text-align: center; font-size: small; margin-top: 10px">Figure 2: bwHPC main transfer routes</p> |
|||
For SDS@hd you can find the main access options at the [[SDS@hd/Access|SDS@hd Access]] page. |
|||
== Best Practices == |
|||
<pre>ssh -c aes128-gcm@openssh.com</pre> |
|||
* '''Strong firewall restrictions'''<br /> -> Use ssh or http(s) based protocols, for example [[Data_Transfer/WebDAV|'''WebDav''']] and [[Data_Transfer/SFTP|sftp]]. For very strict facilities, ssh based protocols might not be allowed. |
|||
A list of available ciphers should be available with the command |
|||
* '''Share data with collaborators...''' |
|||
** ...outside of Baden-Württemberg<br /> -> Use the [[SDS@hd|SDS@hd]] storage. |
|||
** ...that are less comfortable with the command line<br /> -> Let them mount the folder. |
|||
* '''Transfer many small files'''<br /> -> Compress the files to one. |
|||
For advanced topics see [[Advanced_Data_Transfer|advanced-data-transfer]]. |
|||
<pre>ssh -Q cipher</pre> |
Latest revision as of 00:05, 22 November 2024
This page is work in progress. To discover more relevant subpages, please come back at a later timepoint. |
Overview
Data transfer is the exchange of files between two systems. Before data transfer can happen, you need to go through the following steps:
- Choose the two data storage systems that shall exchange data.
- Choose the top level ways of transfer (copy, sync or mount) by considering your specific use case.
- Choose a network protocol or transfer tool to use for the communication between the systems.
The recommended setup already includes these three steps. For a full overview, you can reference the tables that show all transfer routes. Those include all possible combinations between systems, top level way of transfer and network protocol / transfer tool.
Data Storage Systems
Data transfer can happen between a variety of systems. For example:
- Local computer or VM (virtual machine)
- Data producing machine (sequencer, microscope, ...)
- HPC system
- Storage space (SDS@hd, institute server, ...)
- Cloud resource
Ways of Transfer: Copy, Sync, Mount
The top level ways of transfer are:
- Copy: A simple copy command is the most basic way to transfer data. This is most efficient for very big data files that shall be retrieved from or moved to a remote location. And it can be most convenient, if you prefer moving your files via commandline instead of using a file browser.
- Sync: If the data is intended to be kept on both systems and undergoes change on only one of the systems, it makes sense to use a synchronization command instead. This way, only the changed files in one location are updated in the other location. Good use cases are backups or data transfers that go mostly in one direction like moving data from a sequencer to a storage space. A disadvantage is that the data needs storage space on both systems.
- Mount: If the data undergoes change on both systems or is too big to store locally, then mounting is the most convenient solution. This allows you to see and work with the data as if it were stored locally on your computer while it is still placed on the remote system. All changes that you implement happen directly on the original data so that you don't need to copy or synchronize anything. Additionally, you'll see all changes that another party does to the data with just a very short delay.
Figure 1: Top level transfer routes
Network Protocols & Transfer Tools
Basic Network Protocol | Used By Network Protocol |
---|---|
ssh | scp, sftp, rsync |
http(s) | WebDAV |
smb | - |
NFS | - |
For every data transfer a network protocol to use for the communication between the systems must be chosen. The basic network protocols and the network protocols that build directly upon those are shown in the table on the right. These protocols can either be used rather directly or through tools that provide the protocol together with additional features. A tool can either mean a command line tool or a tool with a graphical user interface.
A comprehensive overview of all transfer options (network protocols and tools) can be found on the page all transfer routes.
Recommended Setup
The main tool/protocol for transferring data to a bwHPC cluster from your local machine is as follows:
- MobaXterm for Windows. It is a graphical user interface that allows logging in to the cluster with ssh as well as transferring data via a file browser.
- sshfs for MacOS and Linux. It is the quickest solution for mounting a folder. If you want to use copy and sync as well, it is more convenient to use Rclone instead. Rclone provides copy, sync and mount functionality for various types of infrastructure.
Figure 2: bwHPC main transfer routes
For SDS@hd you can find the main access options at the SDS@hd Access page.
Best Practices
- Strong firewall restrictions
-> Use ssh or http(s) based protocols, for example WebDav and sftp. For very strict facilities, ssh based protocols might not be allowed. - Share data with collaborators...
- ...outside of Baden-Württemberg
-> Use the SDS@hd storage. - ...that are less comfortable with the command line
-> Let them mount the folder.
- ...outside of Baden-Württemberg
- Transfer many small files
-> Compress the files to one.
For advanced topics see advanced-data-transfer.