Table Of Contents

jdbc.iov2.input.Input

Parameters

class jdbc.iov2.input.Input(driver_class, connection_url, query, table_name, reflect_sdl=False, credentials=None, jar_files=None, libraries=None, last_modified_column=None, last_modified_timezone=’UTC’, last_modified_delta=0)[source]

Bases: mitto.iov2.input.BaseInputV2

Input from any database engine with a JDBC interface.

If this inputter is used with mitto.iov2.steps.builtin.CreateTable, the types of the output columns will be determined either by the SDL included in the job configuration or by Mitto’s internal sampling algorithm.

If this inputter is used with jdbc.iov2.steps.builtin.CreateTable and reflect_sdl is true, the types of the output columns will be determined by the types returned by the query.

Note: JDBC always returns naive timestamps, even if the input column type is timzone aware. See additional discussion below in the Timestamps section.

Parameters
  • driver_class (str) – The fully-qualified path to the driver’s Java class.

  • connection_url (str) – The connection URL to be used by the driver when connecting to the database. The format of the string is driver-specific.

  • query (Union[str, List[str]]) – A string or a list of strings that define the SQL query to be used. If a list of strings is provided, they will be concatenated into a single string. The query must end with: “FROM {table_name}”. {table_name} will be replaced with the value of the table_name parameter.

  • table_name (str) – The table name from which to select the data.

  • reflect_sdl (bool) – Controls how the jdbc.iov2.steps.builtin.CreateTable step will determine output column types.

  • credentials (Union[str, Dict[str, Any], None]) – A dict of properties to pass as the driver_args parameter when jaydebeapi.connect() is called to establish a connection with the database. These properties are driver-specific and may include properties beyond traditional “credentials” (e.g., the postgres JDBC driver allows configuration of, the schema name, read-only mode, timeouts, logging, ssl, etc.). The use of Mitto’s named credentials is supported.

  • jar_files (Optional[str]) – String of fully-qualified paths to jar file(s) containing the driver to be used. If the string contains multiple paths, the paths must be separated by a colon: “:”. When locating the driver, the files in jar_files are consulted first, then CLASSPATH, if it is present in the environment.

  • libraries (Optional[str]) – String containing one or more DLL or .so shared libraries used by the driver. If present, this string is passed as an option when Java is invoked (i.e.: -Djava.library.path=<value>).

  • last_modified_column (Optional[str]) – The name of the column containing the last modified time of the data. If this has a value is not null, an upsert is performed; if it is null a load is performed.

  • last_modified_timezone (str) – The timezone to use for data saved in the last_modified_column of the output database for upserts. The timezone is applied to the value while leaving the value unchanged. The value is a string containing the name of an Olsen timezone database. Values can be taken from the TZ database name column of the databases described here. For example, if the value is “America/Chicago” and the input data is 2021-01-14 00:00:00, the data will be saved as: 2021-01-14 00:00:00-06:00.

  • last_modifed_delta – An adjustment in minutes to apply to values when writing output data to last_modified_column.

Yields

dict – A row returned by query.

Notes

CreateTable

Generally, jobs should use mitto.iov2.steps.builtin.CreateTable with this inputter.

jdbc.iov2.steps.builtin.CreateTable should be used with this inputter only when one wishes for the column types of the output table to be determined by the JDBC types returned by execution of the query. For further explanation, see the documentation for the CreateTable step in use.

JDBC Timestamps

JDBC always returns “naive” values for timestamp columns, even if the actual column type is timezone “aware” (e.g., Postgres’s TIMESTAMP WITH TIME ZONE). The actual value is affected by both the timezone for which the database server is configured well as the timezone of the Mitto server running the job. Thus, in some situations, it may be necessary to apply a correction (delta) to values obtained from the last_modified_column input database. This correction is provided via the last_modified_delta parameter.

See this page for additional discussion of aware and naive values.

The last_modified_timezone determines the timezone used when saving data to the last_modified_column of the output database.

The mitto.iov2.steps.builtin.MaxTimestamp step provides a delta parameter that can be used to apply a similar correction to the maximum timestamp value obtained from the output database during an upsert.

Java and JDBC drivers

Use of this connector requires that:

  • Java must be installed. Java is installed by default on Mitto 2.9 or greater.

  • The desired JDBC driver must be installed.

If one or more parameter values are incorrect, the job using this inputter will likely fail with a Java error/traceback. If the Java error message is not helpful, a good fist step is to carefully compare parameter values with those described by the driver’s documentation.

This connector uses the JayDeBeApi Python package for JDBC database access.

Postgres JDBC driver

The Postgres JDBC driver is installed with this connector which allows testing and experimentation with this inputter on a default Mitto install.

As mentioned earlier, the JDBC driver in use determines the values of driver_class, connection_url, and credentials and the formats of those values. For example, when using other inputters, “postgresql://zuar:zuarpass@db.zuar.com/analytics” is a valid dbo; however it is not a valid connection_url for the Postgres JDBC driver. To input data from that database using the JDBC connector, the following configuration fragment would be necessary:

"input": {
   ...
   "connection_url": "jdbc:postgresql://db.zuar.com/analytics",
   "credentials": {
      "user": "zuar",
      "password": "zuarpass"
   },
   ...
}

Examples

  1. Input section of job configuration using JDBC and Postgres.

    "input": {
       "use": "jdbc.iov2.input#Input",
       "driver_class": "org.postgresql.Driver",
       "jar_files":  "/usr/share/java/postgresql.jar",
       "connection_url": "jdbc:postgresql://localhost:5432/analytics",
       "credentials": {
           "user": "zuar",
           "password": "zuarpass"
       },
       "query": "SELECT * FROM public.jdbc_test"
    },
    
  2. Input section of job configuration using JDBC and Db2 on an AS400.

    "input": {
        "use": "jdbc.iov2.input#Input",
        "jar_files":  "/opt/jtopen_10_4/lib/jt400.jar",
        "driver_class": "com.ibm.as400.access.AS400JDBCDriver",
        "connection_url": "jdbc:as400:192.168.1.11",
        "credentials": {
            "user": "ZUAR",
            "password": "ZUARPASS",
            "database_name": "AS400DB",
            "translate binary": "true",
            "naming": "system"
        },
        "query": [
           "SELECT * ",
           "FROM EXAMPLE_TABLE"
        ],
        "reflect_sdl": true
    },
    
  3. The same configuration using named credentials:

    "input": {
        "use": "jdbc.iov2.input#Input",
        "jar_files":  "/opt/jtopen_10_4/lib/jt400.jar",
        "driver_class": "com.ibm.as400.access.AS400JDBCDriver",
        "connection_url": "jdbc:as400:192.168.1.11",
        "credentials": "steve_postgres_named_creds",
        "query": [
           "SELECT * ",
           "FROM EXAMPLE_TABLE"
        ],
        "reflect_sdl": true
    },
    
type_mapping = {DBAPITypeObject(‘CHAR’, ‘NCHAR’, ‘NVARCHAR’, ‘VARCHAR’, ‘OTHER’): ‘String’, DBAPITypeObject(‘CLOB’, ‘LONGVARCHAR’, ‘LONGNVARCHAR’, ‘NCLOB’, ‘SQLXML’): ‘Text’, DBAPITypeObject(‘BINARY’, ‘BLOB’, ‘LONGVARBINARY’, ‘VARBINARY’): ‘LargeBinary’, DBAPITypeObject(‘BOOLEAN’, ‘BIGINT’, ‘BIT’, ‘INTEGER’, ‘SMALLINT’, ‘TINYINT’): ‘Integer’, DBAPITypeObject(‘FLOAT’, ‘REAL’, ‘DOUBLE’): ‘Float’, DBAPITypeObject(‘DECIMAL’, ‘NUMERIC’): ‘Numeric’, DBAPITypeObject(‘DATE’): ‘Date’, DBAPITypeObject(‘TIME’): ‘Time’, DBAPITypeObject(‘TIMESTAMP’): ‘DateTime’, DBAPITypeObject(‘ROWID’): ‘String’}

Map JDBC driver type to sqlalchemy type.

Key is JayDeBeApi type. Value is java.sql.Types constant.

JayDeBeApi driver type documentation: https://github.com/baztian/jaydebeapi/blob/cd2fd4c6c047803916fc92a534f02f1213e252d6/jaydebeapi/__init__.py#L301

Java type documentation: http://download.oracle.com/javase/8/docs/api/java/sql/Types.html

embed_table_name_in_query()[source]

perform template expansion

property data_tzinfo

Separate from _data_tzinfo to allow cached property

Return type

Optional[tzinfo]

localize_value(value)[source]

Apply a timezone to naive datetime if timezone set.

JDBC always returns a naive datetime. Add the specified TZ to the value, if one is configured.

Return type

datetime

jdbc_connect()[source]

Establish JDBC connection

Return type

Connection

jdbc_columns_types()[source]

Returns (name, type) for each column in query results

Return type

Iterator[Tuple[str, DBAPITypeObject]]

updated_at(timestamp)[source]

Set the timestamp of the last run so that incremental updates may be used

Return type

None

apply_delta(obj, delta)[source]

Apply a timedelta to a datetime object

record(row)[source]

Create Record from a given row.

add_sdl_to_environ(environ)[source]

Update environ with SDL for JDBC query results