Summary
Two bugs found during checkpoint/restart fault tolerance testing:
ERROR 50022 duplicate SYS.CCOL$ — Upstream OLR bug triggered by flags: 4 (ADAPTIVE_SCHEMA) combined with any table filter. Not RAC-specific.
- Heap-use-after-free on shutdown —
Ctx::freeMemoryChunk() accesses freed WriterStream via dangling ctx->parserThread. Fixed in commit 6724219d.
Bug 1: duplicate SYS dictionary entries with ADAPTIVE_SCHEMA
Root Cause
When flags: 4 (ADAPTIVE_SCHEMA) is set, Checkpoint.cpp:188 adds a wildcard ".*"/".*" schema element. Combined with any explicit table filter (e.g. "OLR_TEST"/".*"), createSchema() calls readSystemDictionaries() for each schema element. The wildcard loads ALL users' SYS dictionary entries first, then the explicit filter tries to load the same user's entries again.
The SysUser duplicate check in ReplicatorOnline.cpp:1341-1356 is supposed to skip already-loaded users, but doesn't guard all code paths — readSystemDictionariesDetails() gets called twice for overlapping users. addWithKeys() → add() in TablePack.h:146 unconditionally throws on duplicate ROWID.
Key distinction
add() (used by createSchema() DB dictionary load and checkpoint deserialization): always throws on duplicate
forInsert() (used during redo replay in SystemTransaction.cpp): tolerates duplicates when ADAPTIVE_SCHEMA is set
So ADAPTIVE_SCHEMA only protects redo replay, not the initial dictionary load.
Reproduction
Fails on first fresh start — not related to checkpoint/restart at all:
# RAC (Oracle 23ai)
ERROR 50022 duplicate SYS.CCOL$ (ROWID: AAAAAdAAAAAAAEpAAg, CON#: 147, INTCOL#: 1, OBJ#: 32, SPARE1: [0,0]) for insert
# Single-instance (Oracle XE 21c) — same bug
ERROR 50022 duplicate SYS.CCOL$ (ROWID: AAAAAdAABAAAAEpAAO, CON#: 144, INTCOL#: 1, OBJ#: 31, SPARE1: [0,0]) for insert
Minimal config to reproduce on any Oracle instance:
{
"flags": 4,
"filter": {
"table": [{"owner": "ANY_USER", "table": ".*"}]
}
}
Without flags: 4, no error occurs.
Initial misidentification
Originally reported as RAC-specific checkpoint resume issue because the RAC debezium config was the only one with flags: 4. The flag was added as a debugging attempt during checkpoint/restart testing, but it turned out to be the cause of the error, not the fix.
Bug 2: heap-use-after-free on shutdown — FIXED
Fixed in commit 6724219d.
WriterStream constructor sets ctx->parserThread = this
~OpenLogReplicator() deletes writers before builders/parsers
Builder::~Builder() and Parser::~Parser() call freeMemoryChunk(ctx->parserThread, ...) on already-freed memory
- Fix: null out
ctx->parserThread in WriterStream::~WriterStream(), guard freeMemoryChunk() against null thread
This bug exists in upstream OLR but only triggers when a RuntimeException causes shutdown while using the network writer.
Resolution
- Bug 1: Remove
flags: 4 from config. The ADAPTIVE_SCHEMA flag has this upstream bug and is not needed for our use case.
- Bug 2: Fixed in commit 6724219d.
Summary
Two bugs found during checkpoint/restart fault tolerance testing:
ERROR 50022 duplicate SYS.CCOL$— Upstream OLR bug triggered byflags: 4(ADAPTIVE_SCHEMA) combined with any table filter. Not RAC-specific.Ctx::freeMemoryChunk()accesses freedWriterStreamvia danglingctx->parserThread. Fixed in commit 6724219d.Bug 1: duplicate SYS dictionary entries with ADAPTIVE_SCHEMA
Root Cause
When
flags: 4(ADAPTIVE_SCHEMA) is set,Checkpoint.cpp:188adds a wildcard".*"/".*"schema element. Combined with any explicit table filter (e.g."OLR_TEST"/".*"),createSchema()callsreadSystemDictionaries()for each schema element. The wildcard loads ALL users' SYS dictionary entries first, then the explicit filter tries to load the same user's entries again.The
SysUserduplicate check inReplicatorOnline.cpp:1341-1356is supposed to skip already-loaded users, but doesn't guard all code paths —readSystemDictionariesDetails()gets called twice for overlapping users.addWithKeys()→add()inTablePack.h:146unconditionally throws on duplicate ROWID.Key distinction
add()(used bycreateSchema()DB dictionary load and checkpoint deserialization): always throws on duplicateforInsert()(used during redo replay inSystemTransaction.cpp): tolerates duplicates when ADAPTIVE_SCHEMA is setSo ADAPTIVE_SCHEMA only protects redo replay, not the initial dictionary load.
Reproduction
Fails on first fresh start — not related to checkpoint/restart at all:
Minimal config to reproduce on any Oracle instance:
{ "flags": 4, "filter": { "table": [{"owner": "ANY_USER", "table": ".*"}] } }Without
flags: 4, no error occurs.Initial misidentification
Originally reported as RAC-specific checkpoint resume issue because the RAC debezium config was the only one with
flags: 4. The flag was added as a debugging attempt during checkpoint/restart testing, but it turned out to be the cause of the error, not the fix.Bug 2: heap-use-after-free on shutdown — FIXED
Fixed in commit 6724219d.
WriterStreamconstructor setsctx->parserThread = this~OpenLogReplicator()deletes writers before builders/parsersBuilder::~Builder()andParser::~Parser()callfreeMemoryChunk(ctx->parserThread, ...)on already-freed memoryctx->parserThreadinWriterStream::~WriterStream(), guardfreeMemoryChunk()against null threadThis bug exists in upstream OLR but only triggers when a RuntimeException causes shutdown while using the network writer.
Resolution
flags: 4from config. The ADAPTIVE_SCHEMA flag has this upstream bug and is not needed for our use case.