Data staging combines the first four activities of the data value chain:
In ETL (extract, transform, and load) terminology, data is extracted from a source, transformed, and loaded (or piped) into a storage layer. In ELT (extract, load, and transform) terminology, data is extracted from a source, loaded (or piped) into a storage layer, and then transformed.
Both of these methodologies (ETL and ELT) are just decisions to be made as part of a larger data staging strategy.
The end result of data staging is that data is available for the last two activities of the data value chain:
Data sources are generally split into three major categories:
- Flat Files
Mitto uses IO (input output) jobs to connect to these various source categories.
In Mitto IO job terms, those same source categories are handled by various inputs:
Pipelines transfer data from sources to stores.
Mitto pipes data from sources (inputs) to stores (outputs) using IO jobs.
Data from sources can be piped into various storage layers:
- Flat files
Mitto has multiple internal stores:
Additionally, Mitto can store data in external stores (e.g. external databases).
In Mitto IO job terms, storage layers are handled by various outputs:
In many analytics use cases, the storage layer is a relational database as downstream applications typically use SQL for analyzing and presenting data.
Depending on the structure of data being piped from a source, the structure of data needed for a store, and the structure of data needed for analysis and presentation, data may need to be transformed.
This data transformation can happen before, during, or after data is piped.
Mitto IO job inputs and steps handle data transformation.
Additionally, when data is stored in a relational database Mitto SQL jobs can be used for data transformation.