Data Transfer/Rclone

From bwHPC Wiki
Jump to navigation Jump to search

Rclone is a command line tool to manage files on remote systems (e.g. cloud storage systems). Rclone either synchronizes in one direction only or its mounting functionality is used with rclone mount. Data can be piped between two completely remote locations, sometimes without local download. One advantage is that the transfer is multithreaded and it operates on a file level basis. Caution: You can't use Rclone with 2FA.

Installation

Rclone is a Go program and comes as a single binary file.

  1. Download the relevant binary.
  2. Extract the rclone executable, rclone.exe on Windows, from the archive.
  3. You can use the executables without further installation. For easy use, it is recommended to add the binary to your PATH environment variable. Information on how to do this can be found below.

Detailed information regarding different operating systems can be found here:

Usage Rclone

To use Rclone you have to define a config file. Afterwards you can connect by using the name of your configured connections.

Configure Remote

Before you can start using Rclone, you need to set up a remote. This means to configure a specific connection by providing authentication information, the network protocol that you want to use and a name for this configuration so that you can use it later on.

To configure a remote for a specific service, you need the following information:

  • <remotehost>
  • <username>
  • <servicePassword>

Furthermore, you have to decide on:

  • network protocol (for example webDAV, smb, sftp)
  • remote-name (for example you can use the name of the service you want to connect to)

You have three different options to set up a new remote which are explained by the following sections.

Interactive Setup

Execute:

rclone config

This will guide you through an interactive setup process. You can find detailed instructions at the website:

Oneliner

Define all parameters in one command. For example:

rclone config create test sftp host=<remotehost> user=<username> pass=<password> --obscure

Adjust Config File

To see, where the file is run: rclone config file.

You can use the following snippet as template for your connections.

[<remote-name>]
type = webdav
url = <hostURL>
vendor = other
user = <userID>

[<remote-name>]
type = sftp
host = <hostname>
user = <userID>
key_use_agent = false

To add the password, please use

rclone config update <remote-name> pass=<password> --obscure

Use Remote

The syntax to use Rclone is like this:

rclone [options] subcommand <parameters> <parameters...>

List all directories/containers/buckets in the folder XX.

rclone lsd <remote-name>:XX

Copies /local/path to the remote path

rclone copy </local/path> <remote-name>:<remote/path>

Copies fom remote path to /local/path

rclone copy <remote-name>:<remote/path> </local/path>

Moves the contents of the source directory to the destination directory.

rclone move <remote-name>:<source/path> <remote-name>:<destination/path>

More subcommands can be found here.

Usage Rclone Mount

Before you can follow the instructions in this chapter, you need to have set up a remote. Detailed information on how to use rclone mount can be found here.

Windows

To run rclone mount on Windows, you will need to download and install WinFsp. To mount on drive letter X or a nonexistent subdirectory, use:

rclone mount <remote-name>:path/to/files X:
rclone mount <remote-name>:path/to/files C:\path\parent\mount

In contrast to Linux/Mac, there is no background mode.

MacOS & Linux

You can run mount in either foreground or background (aka daemon) mode. Mount runs in foreground mode by default. Use the --daemon flag to force background mode.

Create an empty directory on your local machine and then execute

# to mount the root folder:
rclone mount --vfs-cache-mode full <remote-name>: /path/to/empty/folder 
# to mount a subfolder:
rclone mount --vfs-cache-mode full <remote-name>:folderX/folderY /path/to/empty/folder 

Best Practices

Rclone has a lot of useful options.

Performance

To be able to utilize a larger bandwidth, it is helpful to add the following options for increased performance:

--transfers <int>

Number of file transfers to run in parallel (default: 4). Depending on the local Network, read and write speeds on the file system, and current load, different values might be best. For large transfers, it is advised to test local performance with different values beforehand.

  • In our tests, we observed the best results between 8 and 32.
  • For regular use cases, we recommend 16 as the default.
  • Values above 64 are not recommended and degrade performance.
--multi-thread-streams <int> 

Number of streams to use for multithreaded downloads (default: 4). Only important on very large files. This will cause multithreaded up/download on chunk-sized bits of the file.

The optimal value is highly specific to the local network and used Hardware. For regular use cases, we recommend 4 as the default.

Debugging and Statistics

To get updates on current progress, use:

--stats

Interval between printing stats, e.g. 500ms, 60s, 5m (0 to disable) (default 1m0s).

To get debug information, use:

--log-level=DEBUG 
--stats-log-level=DEBUG