in_tail: Use temp table on overflowing the limit of SQL statement#11383
in_tail: Use temp table on overflowing the limit of SQL statement#11383
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds a temporary-table based stale-inode cleanup path in the tail input plugin to avoid SQLite variable limits, adjusts NOT IN parameter counting, fixes zero-file deletion behavior, and introduces related logging and error handling in the plugin's DB cleanup logic. Changes
Sequence Diagram(s)(omitted) Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
8476ca3 to
704ea89
Compare
|
@cosmo0920 we have a conflict on this one |
|
Oh! I'll rebase off master. |
704ea89 to
24b0bca
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
plugins/in_tail/tail_db.c (1)
377-386:sqlite3_changesmay not reflect the DELETE after the COMMIT exec — consider reading it earlier.
sqlite3_changes()at line 384 is called after theCOMMITstatement is executed viasqlite3_exec. WhileCOMMITis not a DML statement and should leave the change counter untouched per SQLite documentation, this ordering is fragile and differs from the legacy path (line 703) which readssqlite3_changesimmediately aftersqlite3_stepof the DELETE. Moving the read before the COMMIT is safer and removes any ambiguity.Proposed fix
+ changes = sqlite3_changes(ctx->db->handler); + ret = flb_sqldb_query(ctx->db, "COMMIT;", NULL, NULL); if (ret != FLB_OK) { flb_plg_error(ctx->ins, "db: cannot commit transaction for temp inode inserts"); goto error; } txn_started = FLB_FALSE; - changes = sqlite3_changes(ctx->db->handler); flb_plg_info(ctx->ins, "db: delete unmonitored stale inodes from the database: count=%d", changes);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_db.c` around lines 377 - 386, The sqlite3_changes() call should be moved to read the change count immediately after the DELETE executes and before committing the transaction to avoid relying on behavior after COMMIT; update the block around flb_sqldb_query(ctx->db, "COMMIT;", ...) so that you call sqlite3_changes(ctx->db->handler) and store it in changes right after the DELETE path (before calling flb_sqldb_query for the COMMIT) and then log flb_plg_info using that stored changes value, leaving txn_started handling unchanged; refer to ctx->db, sqlite3_changes, flb_sqldb_query, changes and txn_started to locate where to adjust the order.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 285-400: The temp-table cleanup in
flb_tail_db_stale_file_delete_temp_table performs
CREATE/DELETE/BEGIN/INSERT/COMMIT/ROLLBACK calls without holding the global
tail_db_lock, causing races with other DB writers; fix by acquiring tail_db_lock
before any SQLite calls in flb_tail_db_stale_file_delete_temp_table (or ensure
the caller holds it) and release it on every return path (including
error/rollback), keeping the lock held across stmt_prepare/step/finalize and
flb_sqldb_query calls so concurrent writers that use tail_db_lock are properly
serialized.
---
Nitpick comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 377-386: The sqlite3_changes() call should be moved to read the
change count immediately after the DELETE executes and before committing the
transaction to avoid relying on behavior after COMMIT; update the block around
flb_sqldb_query(ctx->db, "COMMIT;", ...) so that you call
sqlite3_changes(ctx->db->handler) and store it in changes right after the DELETE
path (before calling flb_sqldb_query for the COMMIT) and then log flb_plg_info
using that stored changes value, leaving txn_started handling unchanged; refer
to ctx->db, sqlite3_changes, flb_sqldb_query, changes and txn_started to locate
where to adjust the order.
24b0bca to
0d3a188
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
plugins/in_tail/tail_db.c (1)
594-601: Fragile lock ownership: caller acquires, callee releases.The lock is acquired at line 579, but the
returnat line 600 transfers unlock responsibility toflb_tail_db_stale_file_delete_temp_table(lines 389/403). This split ownership is error-prone — the double-unlock bug at line 318 is a direct consequence.Consider unlocking in the caller after the call returns, matching the pattern used by the legacy path (line 717). The callee would then not touch the lock at all.
Proposed refactor — keep lock ownership in the caller
if (max_vars > 0 && file_count > (uint64_t) max_vars) { flb_plg_warn(ctx->ins, "db: large file set detected (%" PRIu64 " files) exceeds SQLite variable limit (%d); " "using temp-table cleanup for stale inode deletion", file_count, max_vars); - return flb_tail_db_stale_file_delete_temp_table(ctx, file_count); + ret = flb_tail_db_stale_file_delete_temp_table(ctx, file_count); + tail_db_unlock(ctx); + return ret; }Then remove all
tail_db_unlockcalls from insideflb_tail_db_stale_file_delete_temp_table(lines 318, 389, 403).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_db.c` around lines 594 - 601, The caller acquires the DB lock but returns early into flb_tail_db_stale_file_delete_temp_table which currently releases the lock, causing split ownership and the double-unlock bug; change the flow so the caller retains lock ownership: remove all tail_db_unlock calls from flb_tail_db_stale_file_delete_temp_table (so the callee never touches the lock) and have the caller call flb_tail_db_stale_file_delete_temp_table while still holding the lock, then perform tail_db_unlock in the caller after the callee returns (matching the legacy-path pattern used elsewhere).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 296-308: The zero-file branch currently performs the DELETE
(flb_sqldb_query) and logs the deleted rows (sqlite3_changes + flb_plg_info) but
then jumps to the error cleanup via "goto error" which returns -1; change this
path so after a successful DELETE you perform the proper DB unlock/cleanup (use
the same DB unlock/cleanup routine used elsewhere for ctx->db) and return a
success code (FLB_OK) instead of jumping to the error label; specifically modify
the block that checks file_count == 0 to remove the final "goto error" and
replace it with the correct unlock/cleanup call for ctx->db followed by "return
FLB_OK".
- Around line 316-320: In flb_tail_db_stale_file_delete remove the redundant
tail_db_unlock call before the goto error: the lock is acquired by the caller
and every other error path uses the single tail_db_unlock at the error label, so
delete the stray tail_db_unlock in the block that logs "db: cannot create temp
table for inode cleanup" (the branch that checks ret != FLB_OK) and let the goto
error reach the existing unlock at the error label to avoid double-unlocking the
mutex.
---
Nitpick comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 594-601: The caller acquires the DB lock but returns early into
flb_tail_db_stale_file_delete_temp_table which currently releases the lock,
causing split ownership and the double-unlock bug; change the flow so the caller
retains lock ownership: remove all tail_db_unlock calls from
flb_tail_db_stale_file_delete_temp_table (so the callee never touches the lock)
and have the caller call flb_tail_db_stale_file_delete_temp_table while still
holding the lock, then perform tail_db_unlock in the caller after the callee
returns (matching the legacy-path pattern used elsewhere).
0d3a188 to
ee9543a
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/in_tail/tail_db.c (1)
285-407: Optional: Document or encapsulate the lock-ownership-transfer contract.
flb_tail_db_stale_file_delete_temp_tableunconditionally releases the caller's lock on every exit path (lines 307, 390, 404) without acquiring it itself. This ownership-transfer contract is implicit and could lead to a double-unlock if the function is ever called directly in the future without the lock held. Consider at minimum a short comment at the function entry to document the contract, or invert the pattern so this function manages its own lock:♻️ Option: self-contained locking
static int flb_tail_db_stale_file_delete_temp_table(struct flb_tail_config *ctx, uint64_t file_count) { int ret; int changes; - int txn_started = FLB_FALSE; + int txn_started = FLB_FALSE; + int lock_held = FLB_FALSE; sqlite3_stmt *stmt_insert_inode = NULL; ... + ret = tail_db_lock(ctx); + if (ret != 0) { + flb_plg_error(ctx->ins, "db: could not acquire lock"); + return -1; + } + lock_held = FLB_TRUE;And remove the three
tail_db_unlock(ctx)calls inside the function body, replacing theerrorlabel's unlock with a conditional onlock_held. The caller inflb_tail_db_stale_file_deletewould then calltail_db_unlock(ctx)normally after the dispatch returns, matching every other early-return path in that function.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_db.c` around lines 285 - 407, flb_tail_db_stale_file_delete_temp_table unconditionally calls tail_db_unlock(ctx) on all exit paths but does not acquire the lock itself; either document this ownership-transfer contract or make the function self-contained: add a short comment at the top of flb_tail_db_stale_file_delete_temp_table stating "assumes caller holds DB lock and this function will release it", or modify the function to call tail_db_lock(ctx) at entry and track a lock_held boolean, remove the three unconditional tail_db_unlock(ctx) calls and instead call tail_db_unlock(ctx) once on EXIT only if lock_held is true (and update the caller flb_tail_db_stale_file_delete to stop unlocking after dispatch). Use the symbols flb_tail_db_stale_file_delete_temp_table, tail_db_lock, tail_db_unlock, and flb_tail_db_stale_file_delete to locate and apply the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 285-407: The earlier lock-was-bypassed concern is now resolved:
flb_tail_db_stale_file_delete_temp_table holds tail_db_lock for its entire body
(tail_db_lock/tail_db_unlock surround all exits) so remove the
duplicate/outdated review comment referencing the prior lock issue; update the
PR comment thread to state the lock concern is addressed and delete or mark the
duplicate comment as resolved, referencing
flb_tail_db_stale_file_delete_temp_table and tail_db_lock to point reviewers to
the validated locking behavior.
---
Nitpick comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 285-407: flb_tail_db_stale_file_delete_temp_table unconditionally
calls tail_db_unlock(ctx) on all exit paths but does not acquire the lock
itself; either document this ownership-transfer contract or make the function
self-contained: add a short comment at the top of
flb_tail_db_stale_file_delete_temp_table stating "assumes caller holds DB lock
and this function will release it", or modify the function to call
tail_db_lock(ctx) at entry and track a lock_held boolean, remove the three
unconditional tail_db_unlock(ctx) calls and instead call tail_db_unlock(ctx)
once on EXIT only if lock_held is true (and update the caller
flb_tail_db_stale_file_delete to stop unlocking after dispatch). Use the symbols
flb_tail_db_stale_file_delete_temp_table, tail_db_lock, tail_db_unlock, and
flb_tail_db_stale_file_delete to locate and apply the change.
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
ee9543a to
2af3c26
Compare
There was a problem hiding this comment.
🧹 Nitpick comments (2)
plugins/in_tail/tail_db.c (2)
633-756:db_lockedguards in the legacy path are always true — optional cleanup.After the early return at line 610 exits the function, the legacy path (line 613 onward) is only reachable with
db_locked == FLB_TRUE. Everyif (db_locked == FLB_TRUE)check from line 633 to line 752 is unconditionally true. The guards are safe but add noise; callingtail_db_unlock(ctx)directly on each error/success path would make the intent clearer. This is a cosmetic issue and can be deferred.♻️ Example simplification (one error path shown; apply the same pattern to all guarded blocks)
- if (db_locked == FLB_TRUE) { - tail_db_unlock(ctx); - } - - return -1; + tail_db_unlock(ctx); + return -1;And at the success path (lines 752–758):
- if (db_locked == FLB_TRUE) { - tail_db_unlock(ctx); - } - db_locked = FLB_FALSE; - - return 0; + tail_db_unlock(ctx); + return 0;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_db.c` around lines 633 - 756, The db_locked boolean is always true for the legacy path, so remove the redundant guards and call tail_db_unlock(ctx) directly where cleanup is needed; update all error and success branches that currently do "if (db_locked == FLB_TRUE) { tail_db_unlock(ctx); }" to simply "tail_db_unlock(ctx);" (affecting the blocks around sqlite3_prepare_v2/stmt_delete_inodes binding/step/finalize and the final success path), and leave db_locked logic unchanged elsewhere (keep the db_locked = FLB_FALSE assignment at the end).
297-313: Zero-file branch is unreachable from the current call site.The only caller (
flb_tail_db_stale_file_delete, line 610) routes here only whenfile_count > max_vars > 0, sofile_count == 0on entry is impossible. The branch is currently dead code, though it doesn't cause harm. Consider adding anassert(file_count > 0)or a code comment to document the invariant and prevent confusion for future callers.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/in_tail/tail_db.c` around lines 297 - 313, The zero-file branch in the block inside flb_tail_db_stale_file_delete is unreachable because callers only invoke this path when file_count > max_vars > 0; add a clear invariant to the function by inserting either an assert(file_count > 0) (or flb_assert-like macro used in this codebase) at the top of the function or a concise comment documenting that file_count is guaranteed > 0 by callers, and remove or mark the DELETE branch as unreachable to avoid confusion; reference the symbols file_count, flb_tail_db_stale_file_delete, tail_db_unlock, ctx and db_locked so the change is easy to locate.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugins/in_tail/tail_db.c`:
- Around line 633-756: The db_locked boolean is always true for the legacy path,
so remove the redundant guards and call tail_db_unlock(ctx) directly where
cleanup is needed; update all error and success branches that currently do "if
(db_locked == FLB_TRUE) { tail_db_unlock(ctx); }" to simply
"tail_db_unlock(ctx);" (affecting the blocks around
sqlite3_prepare_v2/stmt_delete_inodes binding/step/finalize and the final
success path), and leave db_locked logic unchanged elsewhere (keep the db_locked
= FLB_FALSE assignment at the end).
- Around line 297-313: The zero-file branch in the block inside
flb_tail_db_stale_file_delete is unreachable because callers only invoke this
path when file_count > max_vars > 0; add a clear invariant to the function by
inserting either an assert(file_count > 0) (or flb_assert-like macro used in
this codebase) at the top of the function or a concise comment documenting that
file_count is guaranteed > 0 by callers, and remove or mark the DELETE branch as
unreachable to avoid confusion; reference the symbols file_count,
flb_tail_db_stale_file_delete, tail_db_unlock, ctx and db_locked so the change
is easy to locate.
in_tail plugins will fail to start when exceeded the number of placeholders for deleting inodes on DB.
So, this PR introduces an offloading operation which creates temp table and runs deletions with it.
Closes #11272
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Prepare massive amount of files.
This change can detect this type of overflow and use temp database to drop the definitions beyond the number of limit of SQLite's placeholders.
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit
Bug Fixes
Stability