jdbc.iov2.input.Input
¶
Parameters¶
-
class
jdbc.iov2.input.
Input
(driver_class, connection_url, query, table_name, reflect_sdl=False, credentials=None, jar_files=None, libraries=None, last_modified_column=None, last_modified_timezone=’UTC’, last_modified_delta=0)[source]¶ Bases:
mitto.iov2.input.BaseInputV2
Input from any database engine with a JDBC interface.
If this inputter is used with
mitto.iov2.steps.builtin.CreateTable
, the types of the output columns will be determined either by the SDL included in the job configuration or by Mitto’s internal sampling algorithm.If this inputter is used with
jdbc.iov2.steps.builtin.CreateTable
and reflect_sdl is true, the types of the output columns will be determined by the types returned by the query.Note: JDBC always returns naive timestamps, even if the input column type is timzone aware. See additional discussion below in the Timestamps section.
- Parameters
driver_class (
str
) – The fully-qualified path to the driver’s Java class.connection_url (
str
) – The connection URL to be used by the driver when connecting to the database. The format of the string is driver-specific.query (
Union
[str
,List
[str
]]) – A string or a list of strings that define the SQL query to be used. If a list of strings is provided, they will be concatenated into a single string. The query must end with: “FROM {table_name}”. {table_name} will be replaced with the value of the table_name parameter.table_name (
str
) – The table name from which to select the data.reflect_sdl (
bool
) – Controls how thejdbc.iov2.steps.builtin.CreateTable
step will determine output column types.credentials (
Union
[str
,Dict
[str
,Any
],None
]) – A dict of properties to pass as the driver_args parameter when jaydebeapi.connect() is called to establish a connection with the database. These properties are driver-specific and may include properties beyond traditional “credentials” (e.g., the postgres JDBC driver allows configuration of, the schema name, read-only mode, timeouts, logging, ssl, etc.). The use of Mitto’s named credentials is supported.jar_files (
Optional
[str
]) – String of fully-qualified paths to jar file(s) containing the driver to be used. If the string contains multiple paths, the paths must be separated by a colon: “:”. When locating the driver, the files in jar_files are consulted first, then CLASSPATH, if it is present in the environment.libraries (
Optional
[str
]) – String containing one or more DLL or .so shared libraries used by the driver. If present, this string is passed as an option when Java is invoked (i.e.: -Djava.library.path=<value>).last_modified_column (
Optional
[str
]) – The name of the column containing the last modified time of the data. If this has a value is not null, an upsert is performed; if it is null a load is performed.last_modified_timezone (
str
) – The timezone to use for data saved in the last_modified_column of the output database for upserts. The timezone is applied to the value while leaving the value unchanged. The value is a string containing the name of an Olsen timezone database. Values can be taken from the TZ database name column of the databases described here. For example, if the value is “America/Chicago” and the input data is 2021-01-14 00:00:00, the data will be saved as: 2021-01-14 00:00:00-06:00.last_modifed_delta – An adjustment in minutes to apply to values when writing output data to last_modified_column.
- Yields
dict – A row returned by query.
Notes
- CreateTable
Generally, jobs should use
mitto.iov2.steps.builtin.CreateTable
with this inputter.jdbc.iov2.steps.builtin.CreateTable
should be used with this inputter only when one wishes for the column types of the output table to be determined by the JDBC types returned by execution of the query. For further explanation, see the documentation for the CreateTable step in use.
- JDBC Timestamps
JDBC always returns “naive” values for timestamp columns, even if the actual column type is timezone “aware” (e.g., Postgres’s TIMESTAMP WITH TIME ZONE). The actual value is affected by both the timezone for which the database server is configured well as the timezone of the Mitto server running the job. Thus, in some situations, it may be necessary to apply a correction (delta) to values obtained from the last_modified_column input database. This correction is provided via the last_modified_delta parameter.
See this page for additional discussion of aware and naive values.
The last_modified_timezone determines the timezone used when saving data to the last_modified_column of the output database.
The
mitto.iov2.steps.builtin.MaxTimestamp
step provides a delta parameter that can be used to apply a similar correction to the maximum timestamp value obtained from the output database during an upsert.- Java and JDBC drivers
Use of this connector requires that:
Java must be installed. Java is installed by default on Mitto 2.9 or greater.
The desired JDBC driver must be installed.
If one or more parameter values are incorrect, the job using this inputter will likely fail with a Java error/traceback. If the Java error message is not helpful, a good fist step is to carefully compare parameter values with those described by the driver’s documentation.
This connector uses the JayDeBeApi Python package for JDBC database access.
- Postgres JDBC driver
The Postgres JDBC driver is installed with this connector which allows testing and experimentation with this inputter on a default Mitto install.
As mentioned earlier, the JDBC driver in use determines the values of driver_class, connection_url, and credentials and the formats of those values. For example, when using other inputters, “postgresql://zuar:zuarpass@db.zuar.com/analytics” is a valid dbo; however it is not a valid connection_url for the Postgres JDBC driver. To input data from that database using the JDBC connector, the following configuration fragment would be necessary:
"input": { ... "connection_url": "jdbc:postgresql://db.zuar.com/analytics", "credentials": { "user": "zuar", "password": "zuarpass" }, ... }
Examples
Input section of job configuration using JDBC and Postgres.
"input": { "use": "jdbc.iov2.input#Input", "driver_class": "org.postgresql.Driver", "jar_files": "/usr/share/java/postgresql.jar", "connection_url": "jdbc:postgresql://localhost:5432/analytics", "credentials": { "user": "zuar", "password": "zuarpass" }, "query": "SELECT * FROM public.jdbc_test" },
Input section of job configuration using JDBC and Db2 on an AS400.
"input": { "use": "jdbc.iov2.input#Input", "jar_files": "/opt/jtopen_10_4/lib/jt400.jar", "driver_class": "com.ibm.as400.access.AS400JDBCDriver", "connection_url": "jdbc:as400:192.168.1.11", "credentials": { "user": "ZUAR", "password": "ZUARPASS", "database_name": "AS400DB", "translate binary": "true", "naming": "system" }, "query": [ "SELECT * ", "FROM EXAMPLE_TABLE" ], "reflect_sdl": true },
The same configuration using named credentials:
"input": { "use": "jdbc.iov2.input#Input", "jar_files": "/opt/jtopen_10_4/lib/jt400.jar", "driver_class": "com.ibm.as400.access.AS400JDBCDriver", "connection_url": "jdbc:as400:192.168.1.11", "credentials": "steve_postgres_named_creds", "query": [ "SELECT * ", "FROM EXAMPLE_TABLE" ], "reflect_sdl": true },
-
type_mapping
= {DBAPITypeObject(‘CHAR’, ‘NCHAR’, ‘NVARCHAR’, ‘VARCHAR’, ‘OTHER’): ‘String’, DBAPITypeObject(‘CLOB’, ‘LONGVARCHAR’, ‘LONGNVARCHAR’, ‘NCLOB’, ‘SQLXML’): ‘Text’, DBAPITypeObject(‘BINARY’, ‘BLOB’, ‘LONGVARBINARY’, ‘VARBINARY’): ‘LargeBinary’, DBAPITypeObject(‘BOOLEAN’, ‘BIGINT’, ‘BIT’, ‘INTEGER’, ‘SMALLINT’, ‘TINYINT’): ‘Integer’, DBAPITypeObject(‘FLOAT’, ‘REAL’, ‘DOUBLE’): ‘Float’, DBAPITypeObject(‘DECIMAL’, ‘NUMERIC’): ‘Numeric’, DBAPITypeObject(‘DATE’): ‘Date’, DBAPITypeObject(‘TIME’): ‘Time’, DBAPITypeObject(‘TIMESTAMP’): ‘DateTime’, DBAPITypeObject(‘ROWID’): ‘String’}¶ Map JDBC driver type to sqlalchemy type.
Key is JayDeBeApi type. Value is java.sql.Types constant.
JayDeBeApi driver type documentation: https://github.com/baztian/jaydebeapi/blob/cd2fd4c6c047803916fc92a534f02f1213e252d6/jaydebeapi/__init__.py#L301
Java type documentation: http://download.oracle.com/javase/8/docs/api/java/sql/Types.html
-
property
data_tzinfo
¶ Separate from _data_tzinfo to allow cached property
- Return type
Optional
[tzinfo
]
-
localize_value
(value)[source]¶ Apply a timezone to naive datetime if timezone set.
JDBC always returns a naive datetime. Add the specified TZ to the value, if one is configured.
- Return type
datetime
-
jdbc_columns_types
()[source]¶ Returns (name, type) for each column in query results
- Return type
Iterator
[Tuple
[str
,DBAPITypeObject
]]