Skip to content

tui: recover local state db startup failures#22734

Merged
etraut-openai merged 3 commits into
mainfrom
etraut/tui-local-state-db-recovery
May 15, 2026
Merged

tui: recover local state db startup failures#22734
etraut-openai merged 3 commits into
mainfrom
etraut/tui-local-state-db-recovery

Conversation

@etraut-openai
Copy link
Copy Markdown
Collaborator

@etraut-openai etraut-openai commented May 15, 2026

Why

#22580 made app-server startup fail when the local SQLite state database cannot be initialized. Embedded/local TUI startup still continued on the permissive path, which left the CLI inconsistent and could hide a real startup problem behind unrelated UI. This brings local TUI startup onto the same fail-closed behavior while keeping recovery humane for the two failure modes we are seeing in practice: damaged database files and startup stalls caused by another process holding the database write lock.

What changed

  • Embedded TUI startup now uses state_db::try_init(...) and returns a typed LocalStateDbStartupError that preserves the affected database path plus the underlying failure detail.
  • CLI startup handles that failure before entering the interactive TUI:
    • lock-contention failures tell users to quit other Codex processes and try again
    • failures consistent with a broken local database offer a safe repair that backs up Codex-owned SQLite files, rebuilds local database files, and retries startup once
    • declined or unsuccessful repairs print concise guidance plus technical details
  • Shared startup error plumbing lives in tui/src/startup_error.rs, while CLI recovery policy and focused recovery tests live in cli/src/state_db_recovery.rs.

Verification

  • cargo test -p codex-tui embedded_state_db_failure_is_typed_for_cli_recovery
  • cargo test -p codex-cli state_db_recovery
  • Manually held an exclusive SQLite lock on state_5.sqlite and confirmed the CLI shows lock-specific guidance without offering repair.
  • Manually exercised the repair path with a deliberately invalid sqlite_home and confirmed it backs up the blocking path and resumes startup.

@etraut-openai etraut-openai marked this pull request as ready for review May 15, 2026 01:02
Comment thread codex-rs/cli/src/main.rs Outdated
let Some(retry_startup_error) = local_state_db_startup_error(&retry_err) else {
return Err(retry_err);
};
if local_state_db_is_locked(retry_startup_error.detail()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is like mini version of above code -- just loop back and retry once?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed. I collapsed this into a single bounded retry loop with an attempted_repair flag, so startup failure classification now lives in one place while still guaranteeing we only offer repair once.

@etraut-openai etraut-openai merged commit 3a23e87 into main May 15, 2026
31 checks passed
@etraut-openai etraut-openai deleted the etraut/tui-local-state-db-recovery branch May 15, 2026 01:51
@github-actions github-actions Bot locked and limited conversation to collaborators May 15, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants