Data staging combines the first four activities of the data value chain:

  • Source
  • Pipe
  • Store
  • Transform

In ETL (extract, transform, and load) terminology, data is extracted from a source, transformed, and loaded (or piped) into a storage layer. In ELT (extract, load, and transform) terminology, data is extracted from a source, loaded (or piped) into a storage layer, and then transformed.

Both of these methodologies (ETL and ELT) are just decisions to be made as part of a larger data staging strategy.

The end result of data staging is that data is available for the last two activities of the data value chain:

  • Analyze
  • Present

Source

Data sources are generally split into three major categories:

  • APIs
  • Databases
  • Flat Files

Mitto uses IO (input output) jobs to connect to these various source categories.

In Mitto IO job terms, those same source categories are handled by various inputs:

Pipe

Pipelines transfer data from sources to stores.

Mitto pipes data from sources (inputs) to stores (outputs) using IO jobs.

Store

Data from sources can be piped into various storage layers:

  • APIs
  • Databases
  • Flat files

Mitto has multiple internal stores:

Additionally, Mitto can store data in external stores (e.g. external databases).

In Mitto IO job terms, storage layers are handled by various outputs:

In many analytics use cases, the storage layer is a relational database as downstream applications typically use SQL for analyzing and presenting data.

Transform

Depending on the structure of data being piped from a source, the structure of data needed for a store, and the structure of data needed for analysis and presentation, data may need to be transformed.

This data transformation can happen before, during, or after data is piped.

Mitto IO job inputs and steps handle data transformation.

Additionally, when data is stored in a relational database Mitto SQL jobs can be used for data transformation.