Zuar Runner and Rclone - Custom Jobs

Zuar Runner and Rclone - Custom Jobs

Rclone is a program that can be used to transfer files to and from more than forty different storage backends (e.g., Amazon S3, Box, Dropbox, FTP, Google Cloud Storage, Google Drive, Microsoft Azure Blob Storage, Microsoft OneDrive, Microsoft Sharepoint, SFTP, etc.).

The Zuar Runner rclone plugin provides an rclone job and wizard to create configurations to control rclone jobs. Runner's rclone job uses the rclone program to transfer files to and from the Runner instance on which it runs or between two remote systems.

Out of the box (as of Runner 2.8), the Runner rclone plugin wizard supports FTP and sFTP connections.

However, any other rclone connection types like Amazon S3, Box, Dropbox, Onedrive, Google Cloud Storage, Egnyte, Sharepoint, SFTP - with key file, etc can be configured as custom jobs.

Testing Rclone on a Local Machine

Generally speaking, the process for setting up any Runner rclone job will involve configuring and testing rclone on a local machine (preferably not headless) with the following command: rclone config

Download and install Rclone on a local machine.

Once you've successfully set up the connection, you will then translate the resulting rlcone config's key pairs using global rclone flags.

Type rclone config show to show your local rclone remotes' details.

A note about token based configs

For token based authentication (ex: box/dropbox), the tokens will only work on one machine. Meaning if you create and use a remote on your local machine, and try to move that token to Runner it will not work. For cases such as these, you may want to create two remotes locally. Use the first remote to troubleshoot and make sure Rclone is working, then create a second remote solely for use on Runner.

Using the Runner Generic Plugin

In all of the examples below, create the custom Rclone job using the Runner Generic plugin.

In your Runner UI Click the orange Add Job button in the bottom left-hand corner of the screen. Select Generic Job from the wizard.

On the following screen (below) select rclone as the job type.

Manual rclone S3 job
Manual rclone S3 job

Use the examples below as templates for your job's JSON config.

Runner Rclone Job Examples

Below are a few simple examples of custom Runner Rclone job configurations:

Amazon S3

To connect to AWS S3 you will need an Access ID and Key of a user with programmatic access to the buckets you want to use. You will also need the correct region.

Example job config

<pre id="show-json-from-git"></pre>

<script>
var url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/s3.hjson';
fetch(url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-json-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy /var/runner/data/{local-file-name} s3:{bucket-name}{path/to/file/}

In the rclone_flags block, replace {secret-access-key} with your AWS Secret Access Key, {access-key-id} with your AWS Access Key ID, and {region} with your AWS region.

In the targets block, this would copy a file from Runner (source) to Amazon S3 (destination). Replace {bucket-name} and {path/to/file} with your bucket and if necessary additional folder-like paths (do not escape spaces with \). Also replace {local-file-name} with the name of the file you want to copy.

Read more information on Rclone's Amazon S3 documentation for all the available flags.

Box

To use rclone with Box you will need to create an access token.

Box is a bit unique when compared to other RCLONE jobs as the Box access token is used to generate and update a refresh token. Because of this process, you cannot use a standard RCLONE job type in Runner. Instead, you need to create an rclone.conf file with the information about your remote so when the config file is run, the refresh token can be refreshed.

At the end of the rclone config process when creating the remote you should see something similar to this:

[box]
type = box
box_sub_type = user
token = {"access_token":"xxxxx","token_type":"bearer","refresh_token":"xxxx","expiry":"xxxx"}

Box Job setup and configuration

  1. Create a file on your local computer with a text editor like Notepad++, Sublime Text, etc.
  2. Paste the config generated by the remote (like above) into the file and save it as something like rclone.conf, to be referenced in a command job in Runner.
  3. Drag and Drop or upload the rclone.config file into Runner's File page.
  4. Create a new Command Job and use the following command: rclone copy box:/'Database Folder'/'Flat Files'/csv/ /var/runner/data/ --config /var/runner/data/rclone.conf

In the above command there are a few attributes to expand upon:

  1. The command in parts translates to: use rclone to copy files from the remote called box within  /'Database Folder'/'Flat Files'/csv/ directory, copy them to Runner within /var/runner/data file directory, and use the configuration ( --config ) from the file located at /var/runner/data/rclone.conf
  2. The box: after the copy command is referencing the name of the remote.
  3. If you need to navigate a folder/directory with capitalized letters and/or spaces, contain the name within single quotes.

By using the above rclone.conf file, the refresh token will automatically continue to be refreshed by the access token.

Read more information on Rclone's Box Documentation for all the available flags.

Dropbox

To use rclone with Dropbox you will need to create an access token.

At the end of the rclone config process you should see something similar to this:

[dropbox]
app_key =
app_secret =
token = XXXXXXXXXXXXXXXXXXXXXXXXXXXXX_XXXX_XXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Example job config

<pre id="show-dbox-from-git"></pre>

<script>
var dbox_url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/dropbox.hjson';
fetch(dbox_url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-dbox-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy /var/runner/data/{local-file-name} dropbox:{/path/to/file/}

In the rclone_flags block replace the {token} with the token returned after configuring locally.

In the targets block, this would copy a file from Runner (source) to Dropbox (destination). Replace the destination {/path/to/file/} with the correct path in Box and replace {local-file-name} with the file you want to copy.

Read more information on Rclone's Dropbox documentation for all the available flags.

Onedrive

To use rclone with Onedrive you will need to create an access token.

At the end of the rclone config process you should see something similar to this:

[onedrive]
type = onedrive
region = global 
token = {"access_token":"eyJ0eXAiOiJKV1QiLCJ..."}
drive_id = ID
drive_type = business 

Example job config

<pre id="show-od-from-git"></pre>

<script>
var od_url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/onedrive.hjson';
fetch(od_url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-od-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy onedrive:{/path/to/file/} /var/runner/data/

In the rclone_flags block replace the value of --onedrive-token with the token returned after configuring locally. Be sure to escape the quotes in the JSON blob string as in the example.

In the targets block, this would copy a file from Onedrive (source) to /var/runner/data (destination). Replace {/pat/to/file} with the file you want to copy.

Read more information on Rclone's Onedrive documentation for all the available flags.

Google Cloud Storage

There are several ways to configure rclone with Google Cloud Storage. In the example below we chose the service account route.

Learn more from Google on creating and managing service account keys.

Example job config

<pre id="show-gcs-from-git"></pre>

<script>
var gcs_url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/gcs.hjson';
fetch(gcs_url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-gcs-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy gcs:{bucket}/{path/to/file} /var/runner/data/{file} 

In the rclone_flags block replace the {project_number} with your GCP Project's number. with your with the token returned after configuring locally. Drop the GCP service account JSON file into Runner's file manager and replace the {service_account_json_file} with the name of your service account JSON file.

In the targets block, this would copy a file from Google Cloud Storage (source) to Runner (destination). Replace the source's {bucket} and {/path/to/folder/} with the correct bucket and file path in Google Cloud Storage and replace {file} with the file you want to create in Runner.

Read more information on Rclone's Google Cloud Storage documentation for all the available flags.

WebDAV

WebDAV (Web Distributed Authoring and Versioning) is a protocol similar to FTP and SFTP which extends HTTP allowing clients to create, change and move documents on a server. Apache, Nginx and many other servers have modules for WebDAV. It is supported by many sites, services and software. Below are a few examples of Rclone jobs for WebDAV, including for Egnyte, and Sharepoint.

Read more information on Rclone's WebDAV documentation for all the available flags.

Egnyte

Egnyte
Egnyte

To connect to Egnyte with rclone you will need your WebDAV URL https://{yourcompany}.egnyte.com/webdav, and your Egnyte username and password.

Configure a WebDAV Rclone remote locally with rclone config, for "Vendor", select other. At the end of the process you should see something like:

[webdav]
type = webdav
url = https://{yourcompany}.egnyte.com/webdav
vendor = other
user = {user@email.com}
pass = ftwmLfDxzj6D1TcYFxKfbh40SMsoyIEsjhRTYA

Example Job Config

<pre id="show-egnyte-from-git"></pre>

<script>
var egnyte_url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/webdav_egnyte.hjson';
fetch(egnyte_url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-egnyte-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy /var/runner/data/{file} webdav:{path/to/folder}

In the rclone_flags block replace the {yourcompany} with your company's Egnyte slug. Also replace the username {user@email.com} and password with your Egnyte username and the encrypted password from your local rclone config (rclone config show).

In the targets block, this would copy a file from Runner (source) to Egnyte (destination). Replace the source file with any file uploaded to Runner, and replace the destination path {path/to/folder}.

Sharepoint

Sharepoint
Sharepoint

To connect to Sharepoint using WebDAV you will need your Sharepoint Site URL, and a username and password with access to your site.

Configure a WebDAV Rclone remote locally with rclone config, for "Vendor", select Sharepoint. At the end of the process you should see something like:

[webdav]
type = webdav
url = https://{yourcompany}.sharepoint.com
vendor = sharepoint
user = {user@email.com}
pass = _eUx3MtnUtvXQtvfPdSkyGhHM-fxu6qK5sA

Example Job Config

<pre id="show-sp-from-git"></pre>

<script>
var sp_url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/webdav_sharepoint.hjson';
fetch(sp_url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-sp-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy webdav:{path/to/file} /var/runner/data/

In the rclone_flags block replace the {yourcompany} with your company's Sharepoint slug. Also replace the username {user@email.com} and password with your Sharepoint username and the encrypted password from your local rclone config (rclone config show).

In the targets block, this would copy a file from Sharepoint (source) to Runner (destination /var/runner/data). Replace the source path {path/to/file} with the path to the Sharepoint file you want to download.

SFTP

(with SSH key file)

To connect to an SFTP server using an SSH key file, you will need an SFTP username, a hostname for the SFTP server, and the private key file.

Configure an SFTP Rclone remote locally with rclone config, when prompted for the key file enter the path to your SSH key. At the end of the process you should see something like:

[sftp]
type = sftp
host = sftp.hostname.com
user = USERNAME
key_file = /path/to/private_key

Before you create an Rclone job on Runner, you first need to upload your private key to the Runner server using the Runner UI Files page.

In order to do this safely, we suggest encrypting the private key before uploading it using the Runner UI:

Encrypt a file on Mac/Linux using OpenSSL

The following command will use OpenSSL to encrypt the file private_key outputting an encrypted file encrypted.txt

openssl enc -k 3ncrypt -aes-256-cbc -md sha512 -pbkdf2 -iter 100000 -salt -in private_key -out encrypted.txt

Once you have the encrypted file on your Runner, create the following two command line jobs to move the file, and un-encrypt it.

Click + Add to create a new "Command" job and in the wizard add a job title, and then use the following commands, respectively. These jobs need to be run in order, and should only be run once.

Job #1

mkdir -p /var/runner/etc/.ssh && mv /var/runner/data/encrypted.txt /var/runner/etc/.ssh - Make a hidden directory .ssh (if it doesn't exist already) and move the encrypted file to the hidden directory.

Job #2

openssl enc -d -k 3ncrypt -aes-256-cbc -md sha512 -pbkdf2 -iter 100000 -salt -in /var/runner/etc/.ssh/encrypted.txt -out /var/runnner/etc/.ssh/private_key - Un-encrypt the file creating a file named private_key in /var/runner/etc/.ssh/

Example Job Config

<pre id="show-sftp-from-git"></pre>

<script>
var sftp_url = 'https://raw.githubusercontent.com/zuarbase/runner-job-templates/master/rclone/sftp_with_ssh_key.hjson';
fetch(sftp_url)
.then(res => res.text())
.then((out) => {
  document.getElementById("show-sftp-from-git").innerText = out
})
.catch(err => { throw err });
</script>

This would be equivalent to the local rclone command:

rclone copy /var/runner/data/text.csv sftp:path/to/folder/

In the rclone_flags block change the name of the key file in the value of --sftp-key-file, replace USERNAME with your SFTP username in --sftp-user, and replace the value of --sftp-host with your SFTP hostname.

Transport, warehouse, transform, model, report & monitor: learn how Runner gets data flowing from hundreds of potential sources into a single destination for analytics.