# Dynamic Job Configurations

In general, Zuar Runner job configurations are static -- they are defined in
JSON and their behavior never changes.  In some situations, such as
examples, experimentation, and debugging, it can be useful for a job's
behavior to be defined dynamically, via Python code.  Dynamic job
configuration was introduced to allow this.

At its simplest, dynamic job configuration allows the user to embed
Python code in a job configuration to modify the behavior of:
* the input data
* steps the job executes
* transforms applied to the data

.. WARNING::

   Dynamic job configuration is an extremely advanced feature; anything
   beyond the simplest of uses will likely require an in-depth
   understanding of Zuar Runner internals.  Please proceed with caution.

The following Zuar Runner inputters, transforms, and steps allow the
injection of Python code into a running job via the following Python
callables:

* **inputter** - `ExampleInput`
* **transform** - `PythonTransform`
* **step** - `PythonStep`

.. TODO::

   add links to relevant docs

Some aspects of the Python code to be injected is common; those common
aspects are described in the remainder of this page.

## Python Code

Python code is introduced to a job via the `python_code` parameter.  The
value of the parameter can be one of the following:

1. a relative path to a file containing Python code in located in `$MITTO_DATA`.
2. a fully qualified path to a file containing Python code
3. a list of strings containing Python code

Regardless of it source, the [Python `exec` function](
https://docs.python.org/3/library/functions.html#exec) will be called
with `python_code` as its first argument.  In the case of
`PythonTransform` and `PythonStep`, the Python functions `globals` and
`locals` are provided as the called and their return value used as
the second and third args, respectively, to `exec`.

Generally, the code should:

1. define a function to perform the desired action, and
2. assign that function to an attribute of the callable via `self`

When the code is contained in a file, no special formatting is
required.

Example code suitable for use with `PythonTransform` via file:
```python
def transform_func(self, record):
    print("record1=%s" % record)
    return record
self.transform_func = transform_func
```

When code is provided directly in the job config,
quirks in HJSON's handling of indented strings require the use of
special formatting; a `.` should be used to indicate the leftmost
column when it would otherwise contain a space.

The same code, modified for use with `PythonTransform` via the config
file:
```
python_code: [
	def transform_func(self, record):
	.   print("record1=%s" % record)
	.   return record
	self.transform_func = transform_func
],
```

### Callable-specific Details

#### `ExampleInput`

* The function's call signature must match `(self)` because the
  function will be treated as a class method of an `ExampleInput`
  instance.
* The function must be assigned to `self.inputter_func`.
* The purpose of the function is to provide data.
* The function will be treated as an iterator.

Example job configuration fragment:
```
{
   input: {
    use: mitto.iov2.input#ExampleInput
    python_code: [
      def inputter_func(self):
      .   cols = "abcdefghijk"
      .   for i in range(0, 10):
      .       yield {col: i for col in cols}
      self.inputter_func = inputter_func
    ]
  },
  ...
}
```

### `PythonStep` and `PythonTransform`

* The function's call signature must match `(self, record)` because the
  function will be treated as a class method of a
  `PythonStep` or `PythonTransform` instance.
* The function must be assigned to `self.inputter_func`.
* The purpose of the function is to provide data.
* The function will be treated as an iterator.


# General

The value of `python_code` can be any of the following:

1. A string containing the name of a file located in `/var/mitto/data`
   containing valid Python code.
   
2. A string containing the fully-qualified path to a file containing
   valid Python code.
   
3. A list of one or more strings, with each string being a line of
   valid Python.  The individual strings are joined into a single
   string that is passed to the Python `exec` function.
   
Depending upon where the `python_code` is used, additional constraints
may be placed on the code.

## Formatting the List of Strings

When `python_code` is a list of strings, a non-standard formatting
convention is used due to inconsistent handling of indentation by
HJSON.  This is best explained by example:

```
{
	use: mitto.iov2.steps.builtin#PythonStep
	python_code: [
	    # Executed in the context of an instance of the PythonStep class
        # Because this uses the store as input, the job must be configured
	    # with a store.
        def _dynamic_step(self):
	    .    logging.info("start")
        .    from mitto.iov2.input import StoreInput
	    .    from mitto.io.db.utils import (DEFAULT_ENCODE_ERRORS, to_copyfrom_line)
        .    from mitto.io.db.redshift import StreamIter
        .    streamer = StreamIter(
        .        to_copyfrom_line(record, DEFAULT_ENCODE_ERRORS).encode("utf-8")
        .        for record in self.environ[STORE].list()
        .    )
        .    data = streamer.read()
	    .    logging.info("stop")
        # Function must be assigned to `step`
        self.step = _dynamic_step
	]
}
```

Things to note:
* The first non-space character on the line is considered to be "column 1".
* If the first non-space character is a `.`, it is converted to a space.
* Python comments can be used
* The variables available for use depend upon the context of execution

# Execution Context and Other Requirements

## `PythonStep`

When using the `PythonStep` step, `python_code` must define a function
that will be valid as a method of the `PythonStep` class.  The
function must:
* Accept a single argument: `self`
* Expect to be called once during the execution of the job
* Not return a value
* Be assigned to the `step` attribute of the class instance

## `PythonTransform`

When using the `PythonTransform` transform, `python_code` must define
a function that will be valid as a method of the `PythonTransform`
class. The function must:
* Accept two arguments: `self` and `record`
* Expect to be called once for each row of data
* Return `record` or a modified version of `record`
* Be assigned to the `transform_` attributed of the class instance

# Tips and Tricks

1. If you are running the job manually using the CLI via `job_io.py
   config.json`, you can invoke the python debugger via, e.g.:
   ```
   {
       use: mitto.iov2.steps.builtin#PythonStep
       python_code: [
           import pdb; pdb.set_trace()
       ]
   } 
   ```
   Note: this is not possible when the job is being run from the UI,
   the scheduler, a sequence, or via `mitto run`.

2. You can easily add logging statements.

   To log every row at a certain point in a set of transforms:
   ```
   {
       use: mitto.iov2.transform.builtin#PythonTransform
	   python_code: [
	       def transform_(self, record):
           .   logging.info("record=%s", record)
           .   return record
           self.transform_ = transform_
	  ]
   }
   ```

   To log the job execution environment at a certain point in the
   steps:
   ```
   {
       use: mitto.iov2.steps.builtin#PythonStep
	   python_code: [
           logging.info("environ=%s", self.environ)
	   ]
   }
   ```