It's inevitable that if you deal with enough data, especially exporting data from source systems, that you'll eventually need deal with .zip, .gz, or tar.gz files.
To get this into a database, you will likely need to unzip the file and use the decompressed files inside. The following steps will guide you through the process of unzipping a compressed file in Mitto, saving the decompressed files in Mitto, and preparing them to be loaded into a database.
- Make sure that the unzip utility is installed on the Mitto box
sudo apt-get install unzip
. If you are unsure how to SSH to your Mitto instance, or your instance is being hosted by Zuar, please file a ticket here! - Create a command line job
unzip
with the following code:
{
"cmd": "unzip /var/mitto/data/_filename_.zip -d /var/mitto/data",
"cmd_env": {},
"exec": false,
"shell": true
}
The command is unzip
follow by the location of the zip file in Mitto /var/mitto/data/filename.zip
, ending with the destination to place the extracted files -d /var/mitto/data
(Mitto's data folder).
These commands can be adjusted to target .gz files and tar.gz files, or gunzip
or tar -xf filename.tar.gz
respectively.
What if you want to unzip multiple files or create a wildcard match? Here is how to search for multiple files that are named something like data_2018.zip, data_2019.zip, data_2020.zip, data_2021.zip:
{
"cmd": "unzip /var/mitto/data/'data_*.zip' -d /var/mitto/data",
"cmd_env": {},
"exec": false,
"shell": true
}
Lastly, what if you need to automate this as the data continues to change and you must overwrite old files that are being decompressed? Add a simple overwrite command -o
after the unzip command. The cmd line will then look like:
"cmd": "unzip -o /var/mitto/data/'data_*.zip' -d /var/mitto/data",
All together, the final solution will look like:
{
"cmd": "unzip -o /var/mitto/data/'data_*.zip' -d /var/mitto/data",
"cmd_env": {},
"exec": false,
"shell": true
}
This command will unzip all files that start with data_
and end with .zip
, place them in Mitto's data folder and overwrite existing files with the same name.