Skip to content

Latest commit

 

History

History
342 lines (267 loc) · 17.1 KB

File metadata and controls

342 lines (267 loc) · 17.1 KB

OpenLogReplicator.json - format element

Author: Adam Leszczyński <aleszczynski@bersler.com>, version: 1.9.0, date: 2026-02-19

This section documents the format element of the OpenLogReplicator JSON configuration. It describes available options, constraints, defaults and operational notes that affect how transactions and columns are serialized.

Table 1. Format element
Parameter Type / constraints Description and notes

type

string, max length: 256, mandatory

Output serialization format.

  • json — OpenLogReplicator JSON schema (stable).

  • debezium — Predefined set of all formatting options for Debezium connector. When chosen, no other options need to be set manually. However, individual format options can still override Debezium defaults.

  • protobuf — Protocol Buffers (experimental).

CAUTION: protobuf support may not work as you expected. The format is still in development. Verify interoperability and stability before using in production.

attributes

integer, min: 0, max: 15, default: 0

Controls where transaction-level attributes are emitted. Value is a bitmask:

  • 0x0001 — add attributes to the transaction begin message.

  • 0x0002 — add attributes to every DML message.

  • 0x0004 — add attributes to the commit message.

char

integer, min: 0, max: 3, default: 0

Encoding/format for character types (CHAR, NCHAR, VARCHAR2, NVARCHAR2, CLOB).

  • 0x0000 — UTF-8 (default).

  • 0x0001 — no character set conversion; copy bytes as-is.

  • 0x0002 — emit character column value as HEX string.

TIP: Use 0x0001 only when the consumer expects the source bytes unchanged. Use 0x0002 to avoid encoding issues when the charset is ambiguous.

char-set

string, max length: 256, optional

Override the character set used to decode character columns (CHAR, VARCHAR2, CLOB). When set, OLR decodes all user-table character data using this charset instead of the database-declared NLS_CHARACTERSET. System/dictionary tables are not affected.

Use this when the Oracle database reports a single-byte charset (e.g., US7ASCII) but the application stores multi-byte encoded data as raw pass-through bytes (a common practice with legacy CJK databases).

The value must be a valid Oracle character set name supported by OLR, for example:

  • ZHT16BIG5 — Traditional Chinese (Big5)

  • ZHS16GBK — Simplified Chinese (GBK)

  • JA16SJIS — Japanese (Shift-JIS)

  • JA16EUC — Japanese (EUC-JP)

  • KO16MSWIN949 — Korean

  • AL32UTF8 — UTF-8

NOTE: This option produces valid UTF-8 JSON output, unlike char = 0x0001 which copies raw bytes that may not be valid UTF-8.

column

integer, min: 0, max: 2, default: 0

Controls which columns appear in messages:

  • 0 — compact mode: INSERT/DELETE contain only non-null values; UPDATE contains changed columns and PKs (default).

  • 1 — INSERT/DELETE contain all values; UPDATE as above.

  • 2 — include all columns present in the redo stream (may include unchanged values).

CAUTION: Redo information cannot always distinguish whether a column was explicitly set in the DML. Receivers should not assume presence implies user-specified change.

db

integer, min: 0, max: 3, default: 0

Include database name in payloads (bitmask):

  • 0x0000 — omit database name (default).

  • 0x0001 — include db in DML messages.

  • 0x0002 — include db in DDL messages.

flush-buffer

integer, min: 0, default: 1048576

Number of bytes after which the output buffer is flushed. 0 forces immediate flush for every message.

NOTE: Small values reduce latency but increase I/O and downstream load.

interval-dts

integer, min: 0, max: 10, default: 0

Format for INTERVAL DAY TO SECOND values. Options control numeric vs string representation and precision. Lower numeric values keep native precision; string representations preserve precision but increase payload size.

  • 0 — Value in nanoseconds - "val": 123456000000000.

  • 1 — Value in microseconds (possible data precision loss) - "val": 123456000000.

  • 2 — Value in milliseconds (possible data precision loss) - "val": 123456000.

  • 3 — Value in seconds (possible data precision loss) - "val": 123456.

  • 4 — Value in nanoseconds stored as a string - "val": "123456000000000".

  • 5 — Value in microseconds stored as a string (possible data precision loss) - "val": "123456000000".

  • 6 — Value in milliseconds stored as a string (possible data precision loss) - "val": "123456000".

  • 7 — Value in seconds stored as a string (possible data precision loss) - "val": "123456".

  • 8 — Value stored in part of ISO-8601 format stored as a string - "val": "01 06:00:00.123456789".

  • 9 — Value stored in part of ISO-8601 format stored as a string using "," as a separator between the number of days and time - "val": "01,06:00:00.123456789".

  • 10 — Value stored in part of ISO-8601 format stored as a string using "-" as a separator between the number of days and time - "val": "01-06:00:00.123456789".

interval-ytm

integer, min: 0, max: 4, default: 0

Format for INTERVAL YEAR TO MONTH values. Options include raw months, string months, or year/month pair formats.

  • 0 — Value in months - "val": 20 (1 year, 8 months).

  • 1 — Value in months as a string - "val": "20".

  • 2 — Value in string format, number of years and months separated by " " - "val": "1 8".

  • 3 — Value in string format, number of years and months separated by "," - "val": "1,8".

  • 4 — Value in string format, number of years and months separated by "-" - "val": "1-8".

json-number-type

integer, min: 0, max: 1, default: 0

Controls how non-finite IEEE 754 values (NaN, +Infinity, -Infinity) in BINARY_FLOAT and BINARY_DOUBLE columns are serialized in JSON output. The JSON specification does not support these values as numeric literals.

  • 0 — emit null (default). Safe for all JSON consumers. Compatible with Debezium LogMiner adapter behavior (Debezium < 3.4).

  • 1 — emit as JSON strings: "NaN", "Infinity", "-Infinity". Preserves the original value. Compatible with Java Float.parseFloat() / Double.parseDouble(). Use when the consumer can handle string representations of non-finite numbers (e.g., Debezium >= 3.4 with DBZ-806).

NOTE: This option only affects JSON output (type = json or debezium). Protobuf output natively supports IEEE 754 special values and is not affected.

message

integer, min: 0, max: 30, default: 0

Controls message splitting and auxiliary fields (bitmask):

  • 0x0001 — emit a single message per transaction (combine begin/DML/commit).

TIP: Avoid with large transactions (e.g., Kafka) — can produce very large messages.

  • 0x0002 — add num sequence number to each message.

JSON-only flags:

  • 0x0004 — skip begin message when using 0x0001.

  • 0x0008 — skip commit message when using 0x0001.

  • 0x0010 — include low-level data offset for debugging.

rid

integer, min: 0, max: 1, default: 0

If set to 1, add a rid field with the Row ID to each row output.

redo-thread

integer, min: 0, max: 1, default: 0

If set to 1, add a rth field with the Redo Thread ID to each row output.

schema

integer, min: 0, max: 7, default: 0

Controls schema emission.

Example output:

{"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2","obj":0},"after":{"A":100,"B":999,"C":10.22,"D":"xx2","E":"yyy","F":1564662896000}}]}

  • 0x0001 — emit full schema (including columns) once per table when first used.

Example output:

{"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2","columns":[{"name":"A","type":"number","precision":-1,"scale":0,"nullable":1},{"name":"B","type":"number","precision":10,"scale":0,"nullable":1},{"name":"C","type":"number","precision":10,"scale":2,"nullable":1},{"name":"D","type":"char","length":10,"nullable":1},{"name":"E","type":"varchar2","length":10,"nullable":1},{"name":"F","type":"timestamp","length":11,"nullable":1},{"name":"G","type":"date","nullable":1}]},"after":{"A":100,"B":999,"C":10.22,"D":"xx2 ","E":"yyy","F":1564662896000}}]} {"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2","after":{"A":100,"B":999,"C":10.22,"D":"xx3 ","E":"yyy","F":1564662896000}}]}

  • 0x0002 — emit full schema with every message. Use with 0x0001.

  • 0x0004 — include objn (object ID) in schema metadata.

Example output:

{"scns":"0x0","tm":0,"xid":"x","payload":[{"op":"c","schema":{"owner":"USR1","table":"ADAM2"},"after":{"A":100,"B":999,"C":10.22,"D":"xx2 ","E":"yyy","F":1564662896000}}]}

TIP: Use 0x0001 when receivers don’t know the schema.

scn

integer, min: 0, max: 1, default: 0

SCN representation:

  • 0 — decimal scn field.

  • 1 — hexadecimal text in scns field (e.g., 0xFF).

skip-lob-tables

integer, min: 0, max: 1, default: 0

Skip emitting DML events (INSERT, UPDATE, DELETE) for tables that contain LOB columns (CLOB, BLOB, NCLOB).

  • 0 — emit events for all tables including those with LOB columns (default).

  • 1 — silently skip all DML events for tables with LOB columns.

NOTE: Oracle’s internal LOB segment management generates redo records that are indistinguishable from user DML at the raw redo level. On RAC systems this can cause phantom events — INSERT/UPDATE events for rows that were never committed to the database. Enable this option when LOB table accuracy is critical and you plan to replicate LOB tables through a separate mechanism (e.g., LogMiner with COMMITTED_DATA_ONLY).

scn-type

integer, min: 0, max: 15, default: 0

Additional SCN controls (bitmask):

  • 0x0001 — use commit SCN as DML SCN (default is SCN from the redo record).

  • 0x0002 — include bscn/bscns (begin transaction SCN) in every message (only for JSON output).

  • 0x0004 — include scn/scns in every message (default is first message only).

  • 0x0008 — include cscn/cscns (commit transaction SCN) in every message (only for JSON output).

timestamp

integer, min: 0, max: 15, default: 0

Format for timestamps. Options include numeric Unix epoch (ns/us/ms/s) or ISO-8601 string forms with or without timezone.

  • 0 — Unix with nanoseconds - "tm": 1651384800123456789.

  • 1 — Unix with precision to the microsecond (possible data precision loss) - "tm": 1651384800123457.

  • 2 — Unix with precision to the millisecond (possible data precision loss) - "tm": 1651384800123.

  • 3 — Unix with precision to the second (possible data precision loss) - "tm": 1651384800.

  • 4 — Unix with nanosecond precision stored as a string - "tms": "1651384800123456789".

  • 5 — Unix with microsecond precision stored as a string (possible data precision loss) - "tms": "1651384800123457".

  • 6 — Unix with millisecond precision stored as a string (possible data precision loss) - "tms": "1651384800123".

  • 7 — Unix with second precision stored as a string (possible data precision loss) - "tms": "1651384800".

  • 8ISO-8601 format stored with nanosecond precision - "tms": "2022-05-01T06:00:00.123456789Z".

  • 9ISO-8601 format stored with microsecond precision as a string - "tms": "2022-05-01T06:00:00.123456Z".

  • 10ISO-8601 format stored with millisecond precision as a string - "tms": "2022-05-01T06:00:00.123Z".

  • 11ISO-8601 format stored second precision as a string - "tms": "2022-05-01T06:00:00Z".

  • 12ISO-8601 format stored with nanosecond precision as a string without "TZ" - "tms": "2022-05-01 06:00:00.123456789".

  • 13ISO-8601 format stored with microsecond precision as a string without "TZ" - "tms": "2022-05-01 06:00:00.123456".

  • 14ISO-8601 format stored with millisecond precision as a string without "TZ" - "tms": "2022-05-01 06:00:00.123".

  • 15ISO-8601 format stored second precision as a string without "TZ" - "tms": "2022-05-01 06:00:00".

NOTE: timestamp format is also used for TIMESTAMP WITH LOCAL TIME ZONE values (which don’t carry TZ in the value).

timestamp-metadata

integer, min: 0, max: 15, default: 0

Format for timestamps for metadata message. For values and options, see timestamp above.

timestamp-tz

integer, min: 0, max: 11, default: 0

Format for TIMESTAMP WITH TIME ZONE values. Options control epoch vs ISO-8601 and how the timezone is attached (comma, space, or embedded Z).

  • 0 — Unix with nanoseconds stored as a string with time zone after comma sign — "tms": "1651384800123456789,Europe/Warsaw".

  • 1 — Unix with microsecond precision stored as a string with time zone after comma sign (possible data precision loss) — "tms": "1651384800123457,Europe/Warsaw".

  • 2 — Unix with millisecond precision stored as a string with time zone after comma sign (possible data precision loss) — "tms": "1651384800123,Europe/Warsaw".

  • 3 — Unix with second precision stored as a string with time zone after comma sign (possible data precision loss) — "tms": "1651384800,Europe/Warsaw".

  • 4ISO-8601 format stored with nanosecond precision with time zone after space sign — "tms": "2022-05-01T06:00:00.123456789Z Europe/Warsaw".

  • 5ISO-8601 format stored with microsecond precision as a string with time zone after space sign — "tms": "2022-05-01T06:00:00.123456Z Europe/Warsaw".

  • 6ISO-8601 format stored with millisecond precision as a string with time zone after space sign — "tms": "2022-05-01T06:00:00.123Z Europe/Warsaw".

  • 7ISO-8601 format stored second precision as a string with time zone after space sign — "tms": "2022-05-01T06:00:00Z Europe/Warsaw".

  • 8ISO-8601 format stored with nanosecond precision as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00.123456789 Europe/Warsaw".

  • 9ISO-8601 format stored with microsecond precision as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00.123456 Europe/Warsaw".

  • 10ISO-8601 format stored with millisecond precision as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00.123 Europe/Warsaw".

  • 11ISO-8601 format stored second precision as a string without "TZ" with time zone after space sign — "tms": "2022-05-01 06:00:00 Europe/Warsaw".

timestamp-type

integer, min: 0, max: 15, default: 0

Additional timestamp controls (bitmask):

  • 0x0001 — use commit timestamp as DML timestamp (default is timestamp from the redo record).

  • 0x0002 — include btm/btms (begin transaction timestamp) in every message (only for JSON output).

  • 0x0004 — include tm/tms in every message (default is first message only).

  • 0x0008 — include ctm/ctms (commit transaction timestamp) in every message (only for JSON output).

user-type

integer, min: 0, max: 15, default: 0

Additional user controls (bitmask).

  • 0x0001 — include usr in begin messages (only for JSON output).

  • 0x0002 — include usr in every DML message (only for JSON output).

  • 0x0004 — include usr in commit messages (only for JSON output).

  • 0x0008 — include usr in every DDL message (only for JSON output).

unknown

integer, min: 0, max: 1, default: 0

Behavior on decoding unknown values:

  • 0 — silently emit ? for undecodable values (default).

  • 1 — also log decoding mismatch to stderr (warning code 60002).

unknown-type

integer, min: 0, max: 1, default: 0

Behavior for unsupported column types:

  • 0 — skip columns of unsupported type (default).

  • 1 — emit ? for unsupported types; if unknown is 1, also log hex value to stderr.

xid

integer, min: 0, max: 3, default: 0

Transaction ID (XID) format:

  • 0 — classic hex text, e.g., 0x0002.012.00004162.

  • 1 — dotted decimal, e.g., 2.18.16738.

  • 2 — single 64-bit numeric xidn.

  • 3 — Logminer format value stored as a string.

Note
  • Prefer the default compact column mode for typical streaming consumers to reduce bandwidth.

  • Use schema flags to help receivers discover table metadata; avoid schema=0x0002 in high-volume streams.

  • When changing timestamp or scn formats, coordinate consumers to avoid parsing mismatches.

Example format configuration (JSON)
{
  "format": {
    "type": "json",
    "attributes": 0,
    "char": 0,
    "char-set": "AL32UTF8",
    "column": 0,
    "flush-buffer": 1048576,
    "message": 1,
    "schema": 1,
    "timestamp": 8,
    "timestamp-type": 0,
    "user-type": 0,
    "user-type": 0,
    "unknown": 0,
    "xid": 0
  }
}