Once raw source data is stored in a Mitto store, the Mitto store itself can become a source (input) in an IO job.

The primary use case for this is to create several related relational database tables from a single nested data source (API data, JSON, etc).

Example Source Data

Let's use this accounting JSON data as an example:

[
  {
    "amount": 500.00,
    "customer": {
      "id": 1,
      "name": "Customer 1"
    },
    "id": 1,
    "invoice_lines": [
      {
        "amount": 200.00,
        "id": 1,
        "product": "A"
      },
      {
        "amount": 300.00,
        "id": 2,
        "product": "B"
      }
    ],
    "invoice_number": 1001
  },
  {
    "amount": 240.00,
    "customer": {
      "id": 1,
      "name": "Customer 2"
    },
    "id": 2,
    "invoice_lines": [
      {
        "amount": 200.00,
        "id": 1,
        "product": "A"
      },
      {
        "amount": 40.00,
        "id": 2,
        "product": "C"
      }
    ],
    "invoice_number": 1002
  }
]

This dataset has two arrays [ ]:

  1. The data set as a whole is an array
  2. The invoice_lines data contains an array

So, Mitto needs to split this data into two relational database tables.

Table #1 (invoice header)

id invoice_number amount cusomer_id customer_name
1 1001 500.00 1 Customer 1
2 1002 240.00 2 Customer 2

Table 1, can be created using the Mitto JSON plugin.

Job #1 name: json_invoices

Table #2 (invoice lines)

invoice_id id line product amount
1001 1 1 A 200.00
1001 2 2 B 300.00
1002 3 1 A 200.00
1002 4 2 C 40.00

If Mitto job #1 uses a Mitto store, then Table 2, can be created using a Mitto Store job.

Mitto Store Job Config

Mitto store jobs can be created using the Generic plugin.

Here's an example job config that pipes data from a Mitto store.

{
    "input": {
        "name": "json_invoices",
        "use": "mitto.iov2.input#StoreInput",
        "jpath": "$.invoice_lines[*]",
        "members": [
            {
                "name": "invoice_id",
                "value": "$.id"
            }
        ]
    },
    "output": {
        "tablename": "invoice_lines",
        "use": "call:mitto.iov2.db#todb",
        "schema": "json",
        "dbo": "postgresql://localhost/analytics"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}

The input section is the important part of this job config.

Name

The name parameter is the name of the "parent" job that is configured to use a store.

Jpath

The jpath parameter let's you define exactly what section of the store's JSON data to pull from. This is necessary for Table 2 because of the nested data in the invoice_lines array. Learn more about JSONPath (jpath) expressions below.

Query a record from the "parent" job's Mitto store API to determine the correct jpath.

Members

The members parameter let's you add additional columns from any section of the store's JSON data to the resulting database table. In our case, we need to add the id of each invoice to the invoice_lines table in order for us to join the two tables together. members also uses jpath in it's value key, and you can name the resulting column by adjusting the name key.

JSONPath Expressions

To understand how to use JSONPath to pick specific sections of data out of a JSON object you can visit: https://goessner.net/articles/JsonPath/

To interactively learn how to use JSONPath syntax you can visit: https://jsonpath.com/

Best Practices

Always run the store job after the "parent" job because the store job requires the store from the "parent" job.