Online reencryption by jbaublitz · Pull Request #3953 · stratis-storage/stratisd

jbaublitz · 2026-01-13T15:15:49Z

Follow up on #3898
Closes #3961
Closes #3597

Summary by CodeRabbit

New Features
- Added pool encryption, decryption, and reencryption operations accessible via D-Bus interface.
- Added last_reencrypted_timestamp property to track encryption status changes on pools.
Dependencies
- Updated chrono dependency to version 0.4.31.
- Added cryptsetup package to CI workflows for improved dependency consistency.

packit-as-a-service · 2026-01-14T15:04:30Z

Congratulations! One of the builds has completed. 🍾

You can install the built RPMs by following these steps:

sudo dnf install -y 'dnf*-command(copr)'
dnf copr enable packit/stratis-storage-stratisd-3953-copr_pull
And now you can install the packages.

Please note that the RPMs should be used only in a testing environment.

mulkieran · 2026-03-09T18:19:44Z

@jbaublitz Please rebase when you get the chance.

mulkieran · 2026-03-09T18:24:24Z

@jbaublitz Should we require the cryptsetup binary directly in cmd.rs instead of just indirectly through CLEVIS_EXEC_NAMES ? We are doing that! Then we need to require cryptsetup in stratis-min-cli-checks in .github/workflows/main.yml and checks_with_udev in ubuntu.yml. checks_with_udev should not exist in fedora.yml after you have rebased.

mulkieran

Please address the two comments at the bottom, i.e., rebase and add test dependencies.

This adds an upgrade and downgrade API that allows atomically converting from read to write locks and vice versa. This commit also applies the usage of this API in reencryption to allow listing pools during the actual reencryption operation.

mulkieran · 2026-03-20T17:17:09Z

@coderabbitai review

coderabbitai · 2026-03-20T17:17:21Z

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

coderabbitai · 2026-03-20T17:42:29Z

Walkthrough

This PR implements comprehensive support for online pool encryption, decryption, and reencryption operations in stratisd, primarily targeting metadata version 2. Changes include new Engine/Pool trait methods, DBus interface extensions for the pool_3_9 service, extensive cryptsetup bindings integration, lock system upgrades for read-to-write transitions, and updates to CI workflows and dependencies.

Changes

Cohort / File(s)	Summary
CI Workflow Dependencies `.github/workflows/main.yml`, `ubuntu.yml`, `valgrind.yml`	Added `cryptsetup` package to Fedora and Ubuntu dependency installation steps across multiple CI jobs to support encryption-related testing and operations.
Dependency Updates `Cargo.toml`	Updated `chrono` dependency from 0.4.20 to 0.4.31 for enhanced datetime handling in encryption timestamp tracking.
DBus Interface Methods `src/dbus/pool/pool_3_9/methods.rs`	Added three new async DBus methods: `encrypt_pool_method`, `reencrypt_pool_method`, and `decrypt_pool_method` implementing pool encryption lifecycle orchestration with signal emission and error handling.
DBus Interface Definitions `src/dbus/pool/pool_3_9/mod.rs`, `props.rs`	Extended pool_3_9 interface with new methods/properties: `encrypt_pool()`, `reencrypt_pool()`, `decrypt_pool()`, `last_reencrypted_timestamp` property, and property signal emission changes (`encrypted` from `const` to `true`).
DBus Signal Helpers `src/dbus/util.rs`	Added `send_encrypted_signal` and `send_last_reencrypted_signal` async functions to emit D-Bus property-changed signals for encryption status.
Engine Trait Extensions `src/engine/engine.rs`, `mod.rs`	Extended `Pool` trait with 11 new encryption lifecycle methods (`start_encrypt_pool`, `do_encrypt_pool`, `finish_encrypt_pool`, `start_reencrypt_pool`, `do_reencrypt_pool`, `finish_reencrypt_pool`, `decrypt_pool_idem_check`, `do_decrypt_pool`, `finish_decrypt_pool`, `last_reencrypt`); extended `Engine` trait with `upgrade_pool` method; re-exported `EncryptedDevice` type.
SimEngine Implementation `src/engine/sim_engine/engine.rs`, `pool.rs`, `mod.rs`, `shared.rs`	Implemented encryption lifecycle methods for SimPool with state tracking (`last_reencrypt` field), delegated encryption info conversion to new `convert_encryption_info` helper, added `upgrade_pool` method to SimEngine for lock upgrades.
Pool Dispatch Layer `src/engine/strat_engine/pool/dispatch.rs`	Added dispatcher implementations for all 10 encryption lifecycle methods on `AnyPool`, routing to v1/v2-specific implementations.
StratPool v1 Implementation `src/engine/strat_engine/pool/v1.rs`	Added `last_reencrypt` timestamp field to StratPool, implemented reencryption methods (v1-only), stubbed encrypt/decrypt methods with v2-only errors, added key validation during initialization and keyring binding, updated test coverage with online reencryption scenarios.
StratPool v2 Implementation `src/engine/strat_engine/pool/v2.rs`	Added `last_reencrypt` field to StratPool, implemented all 10 encryption lifecycle methods for online encrypt/decrypt/reencrypt, enhanced key validation in bind/rebind operations, updated metadata persistence, added comprehensive test coverage for online operations on loopback and real devices.
Engine Metadata Serialization `src/engine/strat_engine/serde_structs.rs`	Extended `PoolSave` with `last_reencrypt` optional field using custom serde serialization/deserialization for Unix timestamp conversion.
Backstore Encryption APIs `src/engine/strat_engine/backstore/backstore/v1.rs`, `v2.rs`	Added encryption preparation, reencryption, and decryption methods to v1/v2 `Backstore`; v2 refactored cap-device state into new `CapDevice` struct with offset tracking and allocation shifting support; generalized `operation_loop` to return per-device results.
Blockdev Encryption APIs `src/engine/strat_engine/backstore/blockdev/v1.rs`	Added `setup_reencrypt` and `do_reencrypt` methods to `StratBlockDev` delegating to underlying `CryptHandle`.
Device Utilities `src/engine/strat_engine/backstore/devices.rs`, `mod.rs`	Added `get_logical_sector_size` function to query device logical sector size via libblkid, re-exported in backstore module.
Cryptsetup Command Integration `src/engine/strat_engine/cmd.rs`	Moved `cryptsetup` from Clevis-specific executables to general executables list; added `run_encrypt`, `run_reencrypt`, `run_decrypt` functions invoking `cryptsetup reencrypt` with appropriate flags.
Key Validation `src/engine/strat_engine/keys.rs`, `engine.rs`	Added `validate_key_descs` function to validate key descriptors exist in keyring; integrated validation into pool creation paths for v1 and v2 engines.
Cryptsetup Low-Level Integration `src/engine/strat_engine/crypt/handle/v1.rs`, `v2.rs`, `shared.rs`	Extended `CryptHandle` with `setup_reencrypt`/`do_reencrypt` methods; added extensive v2 bindings for `CryptReencrypt` and reencryption parameters; implemented `get_passphrase`, `get_all_passphrases`, `handle_setup_reencrypt`, `handle_do_reencrypt` helpers orchestrating LUKS2 header management and token updates; updated device initialization flow to return refreshed `EncryptionInfo`.
ThinPool Device Management `src/engine/strat_engine/thinpool/dm_structs.rs`, `thinpool.rs`, `filesystem.rs`	Updated `set_target_device` to accept offset and direction parameters for allocation shifting; extended `ThinPool` with `min_logical_sector_size` method; changed v2-only `set_device` signature to include offset/direction; added `StratFilesystem::logical_sector_size` method.
Lock System Enhancements `src/engine/structures/lock.rs`	Implemented `AllOrSomeLock::upgrade` async method converting read locks to write locks; added `SomeLockWriteGuard::downgrade` method; extended wait-state model with `WaitType::Upgrade` variant and conflict-resolution logic.
Type System Extensions `src/engine/types/actions.rs`, `keys.rs`, `mod.rs`	Added `EncryptedDevice` and `ReencryptedDevice` wrapper types with `Display` implementations; added `InputEncryptionInfo::key_descs` iterator; added `OffsetDirection` enum (`Backwards`, `Forwards`); added `TryFrom<u8>`/`From<u8>` implementations for `StratSigblockVersion`.

Sequence Diagram(s)

sequenceDiagram
    participant Client as DBus Client
    participant DBus as DBus Interface
    participant Pool as Pool (Trait)
    participant Backstore as Backstore
    participant CryptHandle as CryptHandle
    participant Cryptsetup as cryptsetup CLI

    Client->>DBus: encrypt_pool_method()
    DBus->>Pool: start_encrypt_pool()
    Pool->>Backstore: prepare_encrypt()
    Backstore->>CryptHandle: setup_encrypt()
    CryptHandle->>Cryptsetup: cryptsetup luksFormat
    Cryptsetup-->>CryptHandle: LUKS header created
    CryptHandle-->>Backstore: (slot, key)
    Backstore-->>Pool: CreateAction::Created()
    Pool-->>DBus: (true, slot, key_info)
    
    DBus->>Pool: do_encrypt_pool()
    Pool->>Backstore: do_encrypt()
    Backstore->>Cryptsetup: cryptsetup reencrypt --encrypt
    Cryptsetup-->>Backstore: encryption in progress
    Backstore-->>Pool: StratisResult<()>
    Pool-->>DBus: (true, 0, success)
    
    DBus->>Pool: finish_encrypt_pool()
    Pool->>Backstore: finish_encrypt()
    Backstore->>CryptHandle: finalize encryption
    CryptHandle-->>Backstore: success
    Backstore-->>Pool: StratisResult<()>
    Pool-->>DBus: (true, 0, success)
    DBus->>Client: Signals: encrypted, keyring info

sequenceDiagram
    participant Client as DBus Client
    participant DBus as DBus Interface
    participant Pool as Pool (Trait)
    participant LockSystem as Lock System
    participant Backstore as Backstore
    participant Cryptsetup as cryptsetup CLI

    Client->>DBus: reencrypt_pool_method()
    DBus->>LockSystem: get_pool_read_guard()
    LockSystem-->>DBus: SomeLockReadGuard
    DBus->>LockSystem: upgrade(read_guard)
    LockSystem->>LockSystem: WaitType::Upgrade queued
    LockSystem-->>DBus: SomeLockWriteGuard
    
    DBus->>Pool: start_reencrypt_pool()
    Pool->>Backstore: prepare_reencrypt()
    Backstore->>Cryptsetup: cryptsetup luksAddKey
    Cryptsetup-->>Backstore: new keyslot allocated
    Backstore-->>Pool: Vec<(slot, key, new_slot)>
    Pool-->>DBus: CreateAction::Created()
    
    DBus->>Pool: do_reencrypt_pool()
    Pool->>Cryptsetup: cryptsetup reencrypt --resume-only
    Cryptsetup-->>Pool: reencryption progresses
    Pool-->>DBus: StratisResult<()>
    
    DBus->>Pool: finish_reencrypt_pool()
    Pool->>Pool: last_reencrypt = Utc::now()
    Pool->>Pool: write metadata
    Pool-->>DBus: ReencryptedDevice
    DBus->>Client: Signal: last_reencrypted_timestamp

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Increase libcryptsetup-rs dependency lower bound to 0.14.0 #3903 — Dependency version bump for libcryptsetup-rs and repository updates that directly support the extensive cryptsetup bindings and reencryption code additions in this PR.
zbus migration #3945 — Prior zbus migration work that established the DBus/zbus infrastructure (module structure, signal helpers, interface definitions) that this PR extensively builds upon with new encryption-related methods and properties.

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'Online reencryption' directly references the main feature being implemented across the codebase, aligning with the PR's primary objective of adding online reencryption support.
Linked Issues check	✅ Passed	The PR implements online encrypt/decrypt/reencrypt operations in metadata v2 (`#3597`) and shells out to cryptsetup binary for long-running operations (`#3961`), with comprehensive engine, pool, DBus, and CLI changes.
Out of Scope Changes check	✅ Passed	All changes directly support online encryption operations: cryptsetup dependency additions in CI, chrono version update for timestamps, new encryption/reencryption/decryption methods across engine/pool/dbus layers, and supporting infrastructure like lock upgrades.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can enable review details to help with troubleshooting, context usage and more.

Enable the reviews.review_details setting to include review details such as the model used, the time taken for each step and more in the review comments.

coderabbitai

Actionable comments posted: 16

🧹 Nitpick comments (2)

src/engine/strat_engine/thinpool/thinpool.rs (1)
2964-2965: Add a non-zero offset regression case here.

This test still exercises only set_device(new_device, Sectors(0), OffsetDirection::Forwards). The new behavior is the offset adjustment itself, so a same-device/non-zero-offset case would cover the critical path that this PR is introducing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engine/strat_engine/thinpool/thinpool.rs` around lines 2964 - 2965, Add a
regression test that exercises the offset-adjustment path by calling
pool.set_device with a non-zero offset and with the same device (i.e., reuse
new_device) instead of Sectors(0); specifically, replace the current Sectors(0)
call with a non-zero Sectors value (e.g., Sectors(>0)) in the set_device
invocation on pool and assert the resulting device/offset state (or behavior)
expected by the new implementation so the same-device/non-zero-offset branch is
covered; reference set_device, Sectors, OffsetDirection::Forwards, pool and
new_device when locating where to add this case.
src/engine/strat_engine/pool/v1.rs (1)
2152-2167: Assert last_reencrypt() after the restart.

This test proves reencryption survives stop_pool()/start_pool(), but it never checks the new last_reencrypt state that record() and setup() now persist. A regression that drops the timestamp on write/read would still pass here.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/engine/strat_engine/pool/v1.rs` around lines 2152 - 2167, The test
currently restarts the pool but never verifies the persisted last_reencrypt
timestamp; after the restart (after
test_async!(engine.start_pool(...)).unwrap()), read pool.last_reencrypt() and
assert it matches the previously observed value (e.g. capture key_info or call
pool.last_reencrypt() before stop_pool and compare that to pool.last_reencrypt()
after start_pool), ensuring record()/setup() persistence is validated; use the
pool API (pool.last_reencrypt()) and appropriate Option handling (is_some() /
equals) to make the assertion.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/dbus/pool/pool_3_9/mod.rs`:
- Around line 336-358: The older PoolR0–PoolR8 interfaces are exposing the
`encrypted` property as immutable so clients cache stale state; update each
PoolR0..PoolR8 interface declaration for the `encrypted` property to include
emits_changed_signal = "true" (so property change notifications are emitted like
PoolR9) — locate the `encrypted` property declarations in the PoolR0..PoolR8
interface blocks and add emits_changed_signal = "true"; alternatively, if you
prefer not to change older interfaces, stop registering PoolR0–PoolR8 for this
object path and only register the PoolR9 interface (affecting where
encrypt_pool_method / reencrypt_pool_method / decrypt_pool_method are exposed)
so clients see live state via PoolR9 only.

In `@src/engine/strat_engine/backstore/backstore/v1.rs`:
- Around line 832-845: The reencrypt function currently zips blockdevs() with
key_info which silently truncates if lengths differ; before iterating, count the
blockdevs() (e.g. via self.blockdevs().map(|(_,_,bd)| bd).count() or collect
into a vec) and compare to key_info.len(), and if they differ return a
StratisResult::Err (use an appropriate error variant/message) so the function
fails fast; then iterate using the collected blockdev list and key_info to call
bd.do_reencrypt(...) as before. Include reference to prepare_reencrypt() in the
error message or comment to indicate stale key_info mismatch.

In `@src/engine/strat_engine/backstore/backstore/v2.rs`:
- Around line 1307-1315: The bug is that self.shift_alloc_offset(...) is called
before checking the result of the thinpool remap, so allocs are moved even if
get_devno_from_path() or thinpool.set_device() fails; change the sequence in the
function(s) using thinpool.suspend()/thinpool.resume() and set_device_res (e.g.,
the block around thinpool.suspend(), get_devno_from_path(...).and_then(|devno|
thinpool.set_device(...)), thinpool.resume(), self.shift_alloc_offset(...),
set_device_res) so that you only call self.shift_alloc_offset(offset,
offset_direction) after set_device_res? succeeds (i.e., after
unwrapping/returning on error), leaving allocs untouched on failure; apply the
same fix to the analogous block around lines where shift_alloc_offset is invoked
at 1409-1416.

In `@src/engine/strat_engine/backstore/blockdev/v1.rs`:
- Around line 410-415: The code in setup_reencrypt uses
underlying_device.crypt_handle().expect(...) which panics; change this to return
a StratisError wrapped in StratisResult when the crypt handle is missing (do not
panic). Specifically, in the method setup_reencrypt (and the similar block
around the other occurrence), check underlying_device.crypt_handle() and if None
return an appropriate StratisError (e.g., an InvalidState or NotEncrypted error)
instead of calling expect, then call crypt_handle.setup_reencrypt() and
propagate its StratisResult; update signatures/returns to preserve
StratisResult<(u32, SizedKeyMemory, u32)> and ensure error construction uses the
project’s StratisError constructors.

In `@src/engine/strat_engine/cmd.rs`:
- Around line 583-615: The three functions run_encrypt, run_reencrypt, and
run_decrypt currently call Command::new(CRYPTSETUP) which triggers PATH lookup;
change each to use the verified absolute path by calling
Command::new(get_executable(CRYPTSETUP)) (or
Command::new(get_executable(CRYPTSETUP).as_os_str())) so the cached, verified
executable is used consistently with other commands; keep the existing args and
then call execute_cmd(&mut cmd) as before.

In `@src/engine/strat_engine/crypt/shared.rs`:
- Around line 1100-1183: handle_setup_reencrypt currently always creates new
keyslots and appends to token JSON, causing duplicate mutations on retry; change
it to detect an existing reencryption context before mutating: for each token
slot (use device.token_handle().json_get and the token_contents Value used in
set_up_reencryption_token), inspect the token JSON for an existing
TOKEN_KEYSLOTS_KEY entry or a custom reencryption marker and, if present, parse
and reuse the already-created keyslot(s) instead of calling
device.keyslot_handle().add_by_key; similarly, before creating
single_new_keyslot, check token_contents or call get_keyslot_number to see if a
reencrypt keyslot already exists and skip creation and json_set when so,
ensuring set_up_reencryption_token only writes when it actually added a new
keyslot.

In `@src/engine/strat_engine/engine.rs`:
- Around line 111-113: The call to validate_key_descs(ei.key_descs()) is being
executed before the idempotency/existing-pool branch and can cause
otherwise-idempotent create_pool*() calls to fail; move the key-description
validation so it runs only when the code takes the new-pool initialization
branch (i.e., after you determine that a new pool will be created and not when
create_pool_idempotent_or_err() returns Identity). Locate the validate_key_descs
invocation(s) (the ones using encryption_info and the calls around lines
referenced previously) and relocate them into the new-pool path/branch that
performs actual pool initialization; ensure create_pool_idempotent_or_err()
still short-circuits without running validation for the existing-pool/idempotent
case.

In `@src/engine/strat_engine/keys.rs`:
- Around line 35-41: Update the incomplete doc comment for validate_key_descs to
a full sentence (e.g., "Validate that all key descriptions exist in the
persistent keyring."). In validate_key_descs, stop using the try operator alone:
call search_key_persistent(key_desc) and match its
StratisResult<Option<KeySerial>>; if it returns Ok(Some(_)) continue, if
Ok(None) convert that into an Err(StratisResult) that indicates the specific
missing key (use the appropriate StratisError variant or create a clear error
message including the KeyDescription), and if Err(_) propagate the error as
before. This ensures missing keys are treated as validation failures rather than
being ignored.

In `@src/engine/strat_engine/pool/v2.rs`:
- Line 701: The call to search_key_persistent(key_description) currently
discards its Result<Option<_>> so missing key descriptions (Ok(None)) are not
rejected; change the callers in this file (the call at
search_key_persistent(key_description)? and the similar calls in the 719-753
range) to inspect the returned Option and return an appropriate error when it is
None (i.e., treat Ok(None) as a preflight failure), e.g., propagate Err(...) or
convert None into a descriptive error before proceeding into the mutating path
so that missing keys are rejected at precheck time.
- Around line 1256-1296: The pool transition writes must persist a recovery
state before any on-disk LUKS/header or DM topology changes: modify
start_encrypt_pool and the corresponding decrypt path (the code that calls
do_decrypt_pool) to call write_metadata (or set a "transitioning" encryption
state in metadata) and flush it before invoking backstore.prepare_encrypt or
Backstore::do_decrypt_pool (and before any call to prepare_encrypt/
do_decrypt_pool in backstore); ensure the metadata records enough info to detect
an incomplete transition and that failure paths roll back or leave the metadata
in a recoverable state so finish_encrypt_pool (and its decrypt counterpart) can
clear the transition and write final metadata after the backstore operations
succeed.

In `@src/engine/strat_engine/serde_structs.rs`:
- Around line 96-104: The function serialize_date_time currently unwraps the
Option with expect and will panic if called with None (fragile because it
depends on skip_serializing_if elsewhere); update serialize_date_time to handle
None defensively by matching or if-let on timestamp and calling
serializer.serialize_i64(...) when Some(ts.timestamp()) and
serializer.serialize_none() (or serializer.serialize_unit()) when None, keeping
the same return type Result<S::Ok, S::Error>; modify the function body
(referencing serialize_date_time and its signature) so it no longer uses expect
and safely serializes None values.

In `@src/engine/strat_engine/thinpool/dm_structs.rs`:
- Around line 122-135: The Backwards branch currently computes
params.start_offset - offset without checking bounds; update the code paths that
construct LinearDevTargetParams::Flakey (and the analogous Linear branch using
FlakeyTargetParams::new and the Forwards/Backwards match) to validate that
params.start_offset >= offset when OffsetDirection::Backwards and return an
error (or otherwise fail fast) instead of building an invalid table; locate the
match on offset_direction around LinearDevTargetParams::Flakey /
FlakeyTargetParams::new and add the guard using params.start_offset and offset,
returning an appropriate Err result if the check fails.

In `@src/engine/strat_engine/thinpool/thinpool.rs`:
- Around line 1964-2010: Precompute and validate all segment shifts by creating
local copies of the segment start values and running apply_offset on those
copies (e.g., iterate self.segments.mdv_segments, .data_segments,
.meta_segments, .meta_spare_segments into temp collections and call apply_offset
on the temps) before calling thin_pool.set_meta_table, thin_pool.set_data_table,
or self.mdv.set_table; only after set_meta_table/set_data_table/mdv.set_table
all succeed, replace the actual segment collections with the precomputed shifted
copies and then set self.backstore_device, so that failures in mdv.set_table or
set_data_table do not leave live DM tables partially updated while segment
vectors or backstore_device remain stale.
- Around line 1938-1940: The early return in set_device() incorrectly treats
same backstore_device alone as idempotent; change the condition to also compare
offset and offset_direction so the function only returns Ok(false) when
backstore_device == self.backstore_device AND offset == self.backstore_offset
AND offset_direction == self.backstore_offset_direction (or the actual field
names used for stored offset/direction). Update the if that currently checks
only backstore_device to include those additional equality checks so offset-only
remaps are not turned into no-ops.

In `@src/engine/structures/lock.rs`:
- Around line 355-360: The Upgrade branch deadlocks when multiple readers on the
same UUID call upgrade() because each upgrader's held read guard counts toward
read_locked; fix by tracking per-UUID pending upgraders and excluding them from
the blocking reader count: add a field like upgrade_waiters: HashMap<Uuid,
usize> (increment in upgrade(), decrement when the upgrade wait ends), then in
the WaitType::Upgrade check compute effective_readers =
read_locked.get(uuid).unwrap_or(&0).saturating_sub(upgrade_waiters.get(uuid).unwrap_or(&0))
and use effective_readers > 1 (or == 0/1 depending on current logic) instead of
raw read_locked; also ensure wake/cleanup paths update upgrade_waiters
consistently so only non-upgrader readers block upgrades.
- Around line 667-688: The upgrade wake path currently calls add_write_lock
which revalidates the waiter as WaitType::SomeWrite and causes a panic in
woken_or_new; instead, add a dedicated add_upgrade_lock method (e.g.,
add_upgrade_lock(&mut self, uuid: U, idx: u64)) that inserts the uuid into
write_locked and calls pre_acquire_assertion(&WaitType::Upgrade(uuid), idx), and
replace the call to add_write_lock(...) in the Upgrade wake branch (the branch
using WaitType::Upgrade and self.1 .1) with add_upgrade_lock(self.1 .1, self.2)
so the woken entry stays validated as Upgrade rather than SomeWrite.

---

Nitpick comments:
In `@src/engine/strat_engine/pool/v1.rs`:
- Around line 2152-2167: The test currently restarts the pool but never verifies
the persisted last_reencrypt timestamp; after the restart (after
test_async!(engine.start_pool(...)).unwrap()), read pool.last_reencrypt() and
assert it matches the previously observed value (e.g. capture key_info or call
pool.last_reencrypt() before stop_pool and compare that to pool.last_reencrypt()
after start_pool), ensuring record()/setup() persistence is validated; use the
pool API (pool.last_reencrypt()) and appropriate Option handling (is_some() /
equals) to make the assertion.

In `@src/engine/strat_engine/thinpool/thinpool.rs`:
- Around line 2964-2965: Add a regression test that exercises the
offset-adjustment path by calling pool.set_device with a non-zero offset and
with the same device (i.e., reuse new_device) instead of Sectors(0);
specifically, replace the current Sectors(0) call with a non-zero Sectors value
(e.g., Sectors(>0)) in the set_device invocation on pool and assert the
resulting device/offset state (or behavior) expected by the new implementation
so the same-device/non-zero-offset branch is covered; reference set_device,
Sectors, OffsetDirection::Forwards, pool and new_device when locating where to
add this case.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 19524bcb-a651-4b50-9cba-6531040d5189

📥 Commits

Reviewing files that changed from the base of the PR and between e5a7024 and 573a13c.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (36)

.github/workflows/main.yml
.github/workflows/ubuntu.yml
.github/workflows/valgrind.yml
Cargo.toml
src/dbus/pool/pool_3_9/methods.rs
src/dbus/pool/pool_3_9/mod.rs
src/dbus/pool/pool_3_9/props.rs
src/dbus/util.rs
src/engine/engine.rs
src/engine/mod.rs
src/engine/sim_engine/engine.rs
src/engine/sim_engine/mod.rs
src/engine/sim_engine/pool.rs
src/engine/sim_engine/shared.rs
src/engine/strat_engine/backstore/backstore/v1.rs
src/engine/strat_engine/backstore/backstore/v2.rs
src/engine/strat_engine/backstore/blockdev/v1.rs
src/engine/strat_engine/backstore/devices.rs
src/engine/strat_engine/backstore/mod.rs
src/engine/strat_engine/cmd.rs
src/engine/strat_engine/crypt/handle/v1.rs
src/engine/strat_engine/crypt/handle/v2.rs
src/engine/strat_engine/crypt/shared.rs
src/engine/strat_engine/engine.rs
src/engine/strat_engine/keys.rs
src/engine/strat_engine/pool/dispatch.rs
src/engine/strat_engine/pool/v1.rs
src/engine/strat_engine/pool/v2.rs
src/engine/strat_engine/serde_structs.rs
src/engine/strat_engine/thinpool/dm_structs.rs
src/engine/strat_engine/thinpool/filesystem.rs
src/engine/strat_engine/thinpool/thinpool.rs
src/engine/structures/lock.rs
src/engine/types/actions.rs
src/engine/types/keys.rs
src/engine/types/mod.rs

src/dbus/pool/pool_3_9/mod.rs

coderabbitai · 2026-03-20T17:42:33Z

src/engine/strat_engine/backstore/backstore/v1.rs

+    pub fn reencrypt(
+        &self,
+        pool_uuid: PoolUuid,
+        key_info: Vec<(u32, SizedKeyMemory, u32)>,
+    ) -> StratisResult<()> {
+        for (bd, (slot, key, new_slot)) in self
+            .blockdevs()
+            .into_iter()
+            .map(|(_, _, bd)| bd)
+            .zip(key_info)
+        {
+            bd.do_reencrypt(pool_uuid, slot, key, new_slot)?;
+        }
+        Ok(())


⚠️ Potential issue | 🟠 Major

Fail fast when the reencryption tuple count doesn't match the blockdev count.

zip truncates silently. If key_info is stale or the device set changes between prepare_reencrypt() and reencrypt(), this returns Ok(()) after reencrypting only a prefix of the pool.

Suggested fix

pub fn reencrypt( &self, pool_uuid: PoolUuid, key_info: Vec<(u32, SizedKeyMemory, u32)>, ) -> StratisResult<()> { - for (bd, (slot, key, new_slot)) in self - .blockdevs() + let blockdevs = self.blockdevs(); + if blockdevs.len() != key_info.len() { + return Err(StratisError::Msg(format!( + "Expected reencryption info for {} block devices, got {}", + blockdevs.len(), + key_info.len() + ))); + } + + for (bd, (slot, key, new_slot)) in blockdevs .into_iter() .map(|(_, _, bd)| bd) .zip(key_info) { bd.do_reencrypt(pool_uuid, slot, key, new_slot)?;

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

pub fn reencrypt(

&self,

pool_uuid: PoolUuid,

key_info: Vec<(u32, SizedKeyMemory, u32)>,

) -> StratisResult<()> {

for (bd, (slot, key, new_slot)) in self

.blockdevs()

.into_iter()

.map(|(_, _, bd)| bd)

.zip(key_info)

{

bd.do_reencrypt(pool_uuid, slot, key, new_slot)?;

}

Ok(())

pub fn reencrypt(

&self,

pool_uuid: PoolUuid,

key_info: Vec<(u32, SizedKeyMemory, u32)>,

) -> StratisResult<()> {

let blockdevs = self.blockdevs();

if blockdevs.len() != key_info.len() {

return Err(StratisError::Msg(format!(

"Expected reencryption info for {} block devices, got {}",

blockdevs.len(),

key_info.len()

)));

}

for (bd, (slot, key, new_slot)) in blockdevs

.into_iter()

.map(|(_, _, bd)| bd)

.zip(key_info)

{

bd.do_reencrypt(pool_uuid, slot, key, new_slot)?;

}

Ok(())

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/strat_engine/backstore/backstore/v1.rs` around lines 832 - 845, The reencrypt function currently zips blockdevs() with key_info which silently truncates if lengths differ; before iterating, count the blockdevs() (e.g. via self.blockdevs().map(|(_,_,bd)| bd).count() or collect into a vec) and compare to key_info.len(), and if they differ return a StratisResult::Err (use an appropriate error variant/message) so the function fails fast; then iterate using the collected blockdev list and key_info to call bd.do_reencrypt(...) as before. Include reference to prepare_reencrypt() in the error message or comment to indicate stale key_info mismatch.

Ok. Let's do this one. The error hypothesized seems very unlikely, but the extra error return is very little additional trouble. If we're convinced the lengths must be the same, an assertion is the appropriate thing.

@mulkieran, agreed — since prepare_reencrypt and reencrypt are always called sequentially while holding the write lock (no blockdevs can be added or removed in between), the lengths are guaranteed to match. A mismatch would be a programming bug, not a recoverable runtime condition, so assert_eq! is more appropriate here:

pub fn reencrypt( &self, pool_uuid: PoolUuid, key_info: Vec<(u32, SizedKeyMemory, u32)>, ) -> StratisResult<()> { let blockdevs = self.blockdevs(); assert_eq!( blockdevs.len(), key_info.len(), "reencryption key info length ({}) does not match blockdev count ({}); \ key_info from prepare_reencrypt() must correspond 1-to-1 with blockdevs()", key_info.len(), blockdevs.len(), ); for (bd, (slot, key, new_slot)) in blockdevs .into_iter() .map(|(_, _, bd)| bd) .zip(key_info) { bd.do_reencrypt(pool_uuid, slot, key, new_slot)?; } Ok(()) }

The same pattern applies to the identical reencrypt method in src/engine/strat_engine/backstore/backstore/v2.rs — worth updating that one consistently.

🧠 Learnings used

Learnt from: jbaublitz Repo: stratis-storage/stratisd PR: 3651 File: src/dbus_api/pool/pool_3_9/methods.rs:173-176 Timestamp: 2025-08-20T02:37:22.477Z Learning: The encrypt_pool, reencrypt_pool, and decrypt_pool methods in src/dbus_api/pool/pool_3_9/methods.rs have intentionally different lock handling patterns due to different workflow requirements, and this is not an inconsistency that needs to be fixed.

Learnt from: jbaublitz Repo: stratis-storage/stratisd PR: 3898 File: src/engine/sim_engine/pool.rs:934-948 Timestamp: 2025-09-05T14:32:42.528Z Learning: In src/engine/sim_engine/pool.rs, the start_encrypt_pool method intentionally creates a SizedKeyMemory with 1 byte allocated but 0 length reported - this is to avoid NULL pointer issues while simulating an empty key slice in the sim engine.

@jbaublitz Code Rabbit makes a good point here, either an assertion or an error return depending on what we believe seems like the correct course.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-03-20T17:42:33Z

src/engine/strat_engine/backstore/backstore/v2.rs

+        thinpool.suspend()?;
+        let set_device_res = get_devno_from_path(
+            &PathBuf::from(DEVICEMAPPER_PATH)
+                .join(format_crypt_backstore_name(&pool_uuid).to_string()),
+        )
+        .and_then(|devno| thinpool.set_device(devno, offset, offset_direction));
+        thinpool.resume()?;
+        self.shift_alloc_offset(offset, offset_direction);
+        set_device_res?;


⚠️ Potential issue | 🟠 Major

Only shift the allocation map after the thinpool remap succeeds.

Both methods call shift_alloc_offset() before checking set_device_res. If get_devno_from_path() or thinpool.set_device() fails, the function returns an error with self.cap_device.allocs already moved into the new coordinate space while the thinpool still points at the old device.

Proposed fix

thinpool.resume()?; - self.shift_alloc_offset(offset, offset_direction); set_device_res?; + self.shift_alloc_offset(offset, offset_direction); Ok((sector_size, key_info)) @@ let set_device_res = get_devno_from_path(&PathBuf::from(DEVICEMAPPER_PATH).join(dm_name.to_string())) .and_then(|devno| thinpool.set_device(devno, offset, direction)); thinpool.resume()?; - self.shift_alloc_offset(offset, direction); set_device_res?; + self.shift_alloc_offset(offset, direction); self.cap_device.enc = None; Ok(())

Also applies to: 1409-1416

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/strat_engine/backstore/backstore/v2.rs` around lines 1307 - 1315, The bug is that self.shift_alloc_offset(...) is called before checking the result of the thinpool remap, so allocs are moved even if get_devno_from_path() or thinpool.set_device() fails; change the sequence in the function(s) using thinpool.suspend()/thinpool.resume() and set_device_res (e.g., the block around thinpool.suspend(), get_devno_from_path(...).and_then(|devno| thinpool.set_device(...)), thinpool.resume(), self.shift_alloc_offset(...), set_device_res) so that you only call self.shift_alloc_offset(offset, offset_direction) after set_device_res? succeeds (i.e., after unwrapping/returning on error), leaving allocs untouched on failure; apply the same fix to the analogous block around lines where shift_alloc_offset is invoked at 1409-1416.

src/engine/strat_engine/backstore/blockdev/v1.rs

src/engine/strat_engine/cmd.rs

coderabbitai · 2026-03-20T17:42:34Z

src/engine/strat_engine/thinpool/dm_structs.rs

+                        match offset_direction {
+                            OffsetDirection::Forwards => params.start_offset + offset,
+                            OffsetDirection::Backwards => params.start_offset - offset,
+                        },
+                    ))
+                }
+                LinearDevTargetParams::Flakey(ref params) => {
+                    let feature_args = params.feature_args.iter().cloned().collect::<Vec<_>>();
+                    LinearDevTargetParams::Flakey(FlakeyTargetParams::new(
+                        device,
+                        match offset_direction {
+                            OffsetDirection::Forwards => params.start_offset + offset,
+                            OffsetDirection::Backwards => params.start_offset - offset,
+                        },


⚠️ Potential issue | 🟠 Major

Guard the backwards offset adjustment.

OffsetDirection::Backwards subtracts offset from every segment start without checking that the segment actually begins that far into the device. A larger shift here will build a table with the wrong physical offsets instead of failing fast.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/strat_engine/thinpool/dm_structs.rs` around lines 122 - 135, The Backwards branch currently computes params.start_offset - offset without checking bounds; update the code paths that construct LinearDevTargetParams::Flakey (and the analogous Linear branch using FlakeyTargetParams::new and the Forwards/Backwards match) to validate that params.start_offset >= offset when OffsetDirection::Backwards and return an error (or otherwise fail fast) instead of building an invalid table; locate the match on offset_direction around LinearDevTargetParams::Flakey / FlakeyTargetParams::new and add the guard using params.start_offset and offset, returning an appropriate Err result if the check fails.

@jbaublitz Maybe set_target_device function could just be migrated into set_device method and it and fn apply_offset direction could handle these subtractions uniformly.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-03-20T17:42:34Z

src/engine/strat_engine/thinpool/thinpool.rs

+        if backstore_device == self.backstore_device {
+            return Ok(false);
+        }


⚠️ Potential issue | 🔴 Critical

Don't treat same-device/non-zero-offset calls as idempotent.

Line 1938 ignores offset and offset_direction, so set_device() becomes a no-op whenever the DM device stays the same. That defeats the new contract for offset-only remaps and can break online reencryption flows where the upper device is unchanged but the data start moves.

🛠️ Minimal fix

- if backstore_device == self.backstore_device { + if backstore_device == self.backstore_device && offset == Sectors(0) { return Ok(false); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

if backstore_device == self.backstore_device {

return Ok(false);

}

if backstore_device == self.backstore_device && offset == Sectors(0) {

return Ok(false);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/strat_engine/thinpool/thinpool.rs` around lines 1938 - 1940, The early return in set_device() incorrectly treats same backstore_device alone as idempotent; change the condition to also compare offset and offset_direction so the function only returns Ok(false) when backstore_device == self.backstore_device AND offset == self.backstore_offset AND offset_direction == self.backstore_offset_direction (or the actual field names used for stored offset/direction). Update the if that currently checks only backstore_device to include those additional equality checks so offset-only remaps are not turned into no-ops.

@jbaublitz I think Code Rabbit's code suggestion is a good one.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-03-20T17:42:34Z

src/engine/strat_engine/thinpool/thinpool.rs

+        self.thin_pool.set_meta_table(get_dm(), meta_table)?;
+        if let Err(e) = self.thin_pool.set_data_table(get_dm(), data_table) {
+            match self.thin_pool.set_meta_table(get_dm(), original_meta.table) {
+                Ok(_) => return Err(StratisError::from(e)),
+                Err(rollback_e) => {
+                    return Err(StratisError::RollbackError {
+                        causal_error: Box::new(StratisError::from(e)),
+                        rollback_error: Box::new(StratisError::from(rollback_e)),
+                        level: ActionAvailability::NoPoolChanges,
+                    })
+                }
+            }
+        }
+        if let Err(e) = self.mdv.set_table(mdv_table) {
+            if let Err(rollback_e) = self.thin_pool.set_meta_table(get_dm(), original_meta.table) {
+                return Err(StratisError::RollbackError {
+                    causal_error: Box::new(e),
+                    rollback_error: Box::new(StratisError::from(rollback_e)),
+                    level: ActionAvailability::NoPoolChanges,
+                });
+            }
+            match self.thin_pool.set_data_table(get_dm(), original_data.table) {
+                Ok(_) => return Err(e),
+                Err(rollback_e) => {
+                    return Err(StratisError::RollbackError {
+                        causal_error: Box::new(e),
+                        rollback_error: Box::new(StratisError::from(rollback_e)),
+                        level: ActionAvailability::NoPoolChanges,
+                    })
+                }
+            }
+        }
+
+        for (start, _) in self.segments.mdv_segments.iter_mut() {
+            apply_offset(start, offset, offset_direction)?;
+        }
+        for (start, _) in self.segments.data_segments.iter_mut() {
+            apply_offset(start, offset, offset_direction)?;
+        }
+        for (start, _) in self.segments.meta_segments.iter_mut() {
+            apply_offset(start, offset, offset_direction)?;
+        }
+        for (start, _) in self.segments.meta_spare_segments.iter_mut() {
+            apply_offset(start, offset, offset_direction)?;
+        }
+
+        self.backstore_device = backstore_device;


⚠️ Potential issue | 🔴 Critical

Validate segment shifts before touching live DM tables.

apply_offset() can still fail after Lines 1964-1977 have already rewritten the meta/data/MDV tables. On that path the caller gets a plain error even though live mappings have changed, self.backstore_device is still stale, and some segment vectors may already be partially updated. The next check() will then hit the invariant at Lines 1230-1235. Please precompute shifted segment copies first, then commit them only after all table updates succeed.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/strat_engine/thinpool/thinpool.rs` around lines 1964 - 2010, Precompute and validate all segment shifts by creating local copies of the segment start values and running apply_offset on those copies (e.g., iterate self.segments.mdv_segments, .data_segments, .meta_segments, .meta_spare_segments into temp collections and call apply_offset on the temps) before calling thin_pool.set_meta_table, thin_pool.set_data_table, or self.mdv.set_table; only after set_meta_table/set_data_table/mdv.set_table all succeed, replace the actual segment collections with the precomputed shifted copies and then set self.backstore_device, so that failures in mdv.set_table or set_data_table do not leave live DM tables partially updated while segment vectors or backstore_device remain stale.

@coderabbitai Your pre-computation suggestion is probably a good one.

@jbaublitz Let's take Code Rabbit's advice on this one.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

@mulkieran, agreed — here's the concrete fix. The idea is to clone all four segment vectors and validate all the offset arithmetic before any DM table is touched. Only after every set_meta_table / set_data_table / set_table call succeeds do we swap in the pre-computed segment vectors and update self.backstore_device.

/// Set the device on all DM devices pub fn set_device( &mut self, backstore_device: Device, offset: Sectors, offset_direction: OffsetDirection, ) -> StratisResult<bool> { fn apply_offset( start: &mut Sectors, offset: Sectors, offset_direction: OffsetDirection, ) -> StratisResult<()> { match offset_direction { OffsetDirection::Forwards => { *start = start.checked_add(offset).ok_or_else(|| { StratisError::Msg(format!( "Allocation shift would overflow integer, start: {start}, offset: {offset}" )) })?; } OffsetDirection::Backwards => { *start = Sectors(start.checked_sub(*offset).ok_or_else(|| { StratisError::Msg(format!( "Allocation shift would underflow integer, start: {start}, offset: {offset}" )) })?); } } Ok(()) } if backstore_device == self.backstore_device { return Ok(false); } // Pre-compute all shifted segment copies and validate arithmetic BEFORE // touching any live DM table. A failure here is a clean error with no // side-effects. let mut new_mdv_segments = self.segments.mdv_segments.clone(); let mut new_data_segments = self.segments.data_segments.clone(); let mut new_meta_segments = self.segments.meta_segments.clone(); let mut new_meta_spare_segments = self.segments.meta_spare_segments.clone(); for (start, _) in new_mdv_segments.iter_mut() { apply_offset(start, offset, offset_direction)?; } for (start, _) in new_data_segments.iter_mut() { apply_offset(start, offset, offset_direction)?; } for (start, _) in new_meta_segments.iter_mut() { apply_offset(start, offset, offset_direction)?; } for (start, _) in new_meta_spare_segments.iter_mut() { apply_offset(start, offset, offset_direction)?; } let original_meta = self.thin_pool.meta_dev().table().clone(); let original_data = self.thin_pool.data_dev().table().clone(); let meta_table = linear_table::set_target_device( self.thin_pool.meta_dev().table(), backstore_device, offset, offset_direction, ); let data_table = linear_table::set_target_device( self.thin_pool.data_dev().table(), backstore_device, offset, offset_direction, ); let mdv_table = linear_table::set_target_device( self.mdv.device().table(), backstore_device, offset, offset_direction, ); self.thin_pool.set_meta_table(get_dm(), meta_table)?; if let Err(e) = self.thin_pool.set_data_table(get_dm(), data_table) { match self.thin_pool.set_meta_table(get_dm(), original_meta.table) { Ok(_) => return Err(StratisError::from(e)), Err(rollback_e) => { return Err(StratisError::RollbackError { causal_error: Box::new(StratisError::from(e)), rollback_error: Box::new(StratisError::from(rollback_e)), level: ActionAvailability::NoPoolChanges, }) } } } if let Err(e) = self.mdv.set_table(mdv_table) { if let Err(rollback_e) = self.thin_pool.set_meta_table(get_dm(), original_meta.table) { return Err(StratisError::RollbackError { causal_error: Box::new(e), rollback_error: Box::new(StratisError::from(rollback_e)), level: ActionAvailability::NoPoolChanges, }); } match self.thin_pool.set_data_table(get_dm(), original_data.table) { Ok(_) => return Err(e), Err(rollback_e) => { return Err(StratisError::RollbackError { causal_error: Box::new(e), rollback_error: Box::new(StratisError::from(rollback_e)), level: ActionAvailability::NoPoolChanges, }) } } } // All DM table updates succeeded; commit the pre-computed segment // vectors and the new backstore device atomically from the caller's // perspective. self.segments.mdv_segments = new_mdv_segments; self.segments.data_segments = new_data_segments; self.segments.meta_segments = new_meta_segments; self.segments.meta_spare_segments = new_meta_spare_segments; self.backstore_device = backstore_device; Ok(true) }

Key changes:

All four segment vectors are cloned and every apply_offset call is executed before any DM table is touched — arithmetic errors are surfaced cleanly with no side-effects.

The original four for … apply_offset loops at the bottom of the function are removed.

After all three DM table updates succeed, the pre-computed vectors replace the originals and self.backstore_device is updated in one logical step.

🧠 Learnings used

Learnt from: mulkieran Repo: stratis-storage/stratisd PR: 3953 File: src/engine/strat_engine/backstore/blockdev/v1.rs:410-415 Timestamp: 2026-03-20T20:00:36.372Z Learning: In `src/engine/strat_engine/backstore/blockdev/v1.rs`, the `setup_reencrypt` and `do_reencrypt` methods intentionally use `.expect("Checked that pool is encrypted")` on `crypt_handle()` rather than returning a `StratisError`. The crypt handle is obtained programmatically and all block devices in the pool are guaranteed to be encrypted before these methods are called, making the `expect` a valid programmer assertion rather than a recoverable runtime error.

Learnt from: mulkieran Repo: stratis-storage/stratisd PR: 3953 File: src/engine/strat_engine/keys.rs:35-41 Timestamp: 2026-03-20T23:01:09.032Z Learning: In Rust code that uses `Result<Option<T>>`, do not rely on the `?` operator to handle `Ok(None)` implicitly. `?` will treat `Ok(None)` as a successful value and may incorrectly skip required error handling. Instead, explicitly match on `Option` (e.g., `match res? { Some(v) => ..., None => return Err(/* missing key */) }`) so missing-key cases fail fast when they should.

Learnt from: jbaublitz Repo: stratis-storage/stratisd PR: 3651 File: src/dbus_api/pool/pool_3_9/methods.rs:173-176 Timestamp: 2025-08-20T02:37:22.477Z Learning: The encrypt_pool, reencrypt_pool, and decrypt_pool methods in src/dbus_api/pool/pool_3_9/methods.rs have intentionally different lock handling patterns due to different workflow requirements, and this is not an inconsistency that needs to be fixed.

coderabbitai · 2026-03-20T17:42:34Z

src/engine/structures/lock.rs

+            WaitType::Upgrade(uuid) => {
+                self.read_locked.get(uuid).unwrap_or(&0) > &1
+                    || self.write_locked.contains(uuid)
+                    || self.all_read_locked > 0
+                    || self.all_write_locked
+            }


⚠️ Potential issue | 🔴 Critical

Concurrent upgrades on the same UUID will deadlock.

Each upgrade keeps its read guard alive while waiting, but this branch only proceeds once the per-UUID read count drops to 1. If two readers on the same pool both call upgrade(), both futures park with the count stuck at 2 and no task can make progress.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/structures/lock.rs` around lines 355 - 360, The Upgrade branch deadlocks when multiple readers on the same UUID call upgrade() because each upgrader's held read guard counts toward read_locked; fix by tracking per-UUID pending upgraders and excluding them from the blocking reader count: add a field like upgrade_waiters: HashMap<Uuid, usize> (increment in upgrade(), decrement when the upgrade wait ends), then in the WaitType::Upgrade check compute effective_readers = read_locked.get(uuid).unwrap_or(&0).saturating_sub(upgrade_waiters.get(uuid).unwrap_or(&0)) and use effective_readers > 1 (or == 0/1 depending on current logic) instead of raw read_locked; also ensure wake/cleanup paths update upgrade_waiters consistently so only non-upgrader readers block upgrades.

@coderabbitai I believe that this should induce a hang when two re-encryption operations on the same pool are scheduled to happen, but I can not induce a hang.

🧩 Analysis chain

Let me check whether the higher-level start_reencrypt_pool logic prevents two concurrent upgrade attempts from both reaching the lock layer.

🏁 Script executed:

#!/bin/bash # Check start_reencrypt_pool implementations in v1 and v2 pool files rg -n "start_reencrypt_pool" --type rust -A 20

Length of output: 12296

@mulkieran, you're right that a hang cannot be induced, and here's the reason why: the method's lock sequence prevents two upgraders from ever co-existing.

Looking at reencrypt_pool_method (lines 156–165 in src/dbus/pool/pool_3_9/methods.rs), the full sequence is:

Acquire a write guard (engine.get_mut_pool(...))

Call start_reencrypt_pool() (write guard held)

Downgrade to a read guard (guard.downgrade())

Call do_reencrypt_pool(...) (read guard held — the long-running step)

Upgrade back to a write guard (upgrade_pool(guard))

Call finish_reencrypt_pool(...)

The key is step 1. A second concurrent request for the same pool UUID will block at step 1 waiting for the write lock — it never holds a read guard of its own during this time. By the time it eventually acquires the write lock and downgrades to a read guard, Task A has already fully completed and released. So when upgrade_pool is called at step 5, read_locked[uuid] is always exactly 1 (only the caller's own downgraded read guard), satisfying > &1 == false, and the upgrade proceeds without waiting.

The deadlock scenario I flagged — two tasks simultaneously holding read guards and both calling upgrade() — would require two tasks to each independently get through step 3 at the same time, which the serializing write lock at step 1 prevents entirely. My original comment overstated the risk for this specific usage pattern.

🧠 Learnings used

Learnt from: jbaublitz Repo: stratis-storage/stratisd PR: 3651 File: src/dbus_api/pool/pool_3_9/methods.rs:173-176 Timestamp: 2025-08-20T02:37:22.477Z Learning: The encrypt_pool, reencrypt_pool, and decrypt_pool methods in src/dbus_api/pool/pool_3_9/methods.rs have intentionally different lock handling patterns due to different workflow requirements, and this is not an inconsistency that needs to be fixed.

Learnt from: mulkieran Repo: stratis-storage/stratisd PR: 3953 File: src/engine/strat_engine/backstore/blockdev/v1.rs:410-415 Timestamp: 2026-03-20T20:00:36.372Z Learning: In `src/engine/strat_engine/backstore/blockdev/v1.rs`, the `setup_reencrypt` and `do_reencrypt` methods intentionally use `.expect("Checked that pool is encrypted")` on `crypt_handle()` rather than returning a `StratisError`. The crypt handle is obtained programmatically and all block devices in the pool are guaranteed to be encrypted before these methods are called, making the `expect` a valid programmer assertion rather than a recoverable runtime error.

Learnt from: mulkieran Repo: stratis-storage/stratisd PR: 3953 File: src/engine/strat_engine/keys.rs:35-41 Timestamp: 2026-03-20T23:01:09.032Z Learning: In Rust code that uses `Result<Option<T>>`, do not rely on the `?` operator to handle `Ok(None)` implicitly. `?` will treat `Ok(None)` as a successful value and may incorrectly skip required error handling. Instead, explicitly match on `Option` (e.g., `match res? { Some(v) => ..., None => return Err(/* missing key */) }`) so missing-key cases fail fast when they should.

@jbaublitz I can not induce a deadlock, but I think this is worth checking into.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

coderabbitai · 2026-03-20T17:42:34Z

src/engine/structures/lock.rs

+        let wait_type = WaitType::Upgrade(self.1 .1);
+        let poll = if lock_record.should_wait(&wait_type, self.2) {
+            lock_record.add_waiter(
+                &AtomicBool::new(true),
+                wait_type,
+                cxt.waker().clone(),
+                self.2,
+            );
+            Poll::Pending
+        } else {
+            lock_record.add_write_lock(self.1 .1, Some(self.2));
+            let (_, rf) = unsafe { inner.get().as_mut() }
+                .expect("cannot be null")
+                .get_mut_by_uuid(self.1 .1)
+                .expect("Checked above");
+            Poll::Ready(SomeLockWriteGuard(
+                Arc::clone(&self.1 .0),
+                self.1 .1,
+                self.1 .2.clone(),
+                rf,
+                true,
+            ))


⚠️ Potential issue | 🔴 Critical

Don't revalidate a woken upgrade as SomeWrite.

A queued upgrade is recorded in woken as WaitType::Upgrade, but this path goes through add_write_lock(..., Some(idx)), which validates it as WaitType::SomeWrite. The first contended upgrade that gets woken will hit the assert_eq! in woken_or_new() and panic.

Suggested fix

- lock_record.add_write_lock(self.1 .1, Some(self.2)); + lock_record.add_upgrade_lock(self.1 .1, self.2);

fn add_upgrade_lock(&mut self, uuid: U, idx: u64) { self.write_locked.insert(uuid); self.pre_acquire_assertion(&WaitType::Upgrade(uuid), idx); trace!("Lock record after acquisition: {self}"); }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/engine/structures/lock.rs` around lines 667 - 688, The upgrade wake path currently calls add_write_lock which revalidates the waiter as WaitType::SomeWrite and causes a panic in woken_or_new; instead, add a dedicated add_upgrade_lock method (e.g., add_upgrade_lock(&mut self, uuid: U, idx: u64)) that inserts the uuid into write_locked and calls pre_acquire_assertion(&WaitType::Upgrade(uuid), idx), and replace the call to add_write_lock(...) in the Upgrade wake branch (the branch using WaitType::Upgrade and self.1 .1) with add_upgrade_lock(self.1 .1, self.2) so the woken entry stays validated as Upgrade rather than SomeWrite.

@coderabbitai Which invocation instance of woken_or_new() do you think could panic?

@mulkieran, the panic would occur in the woken_or_new() call made transitively through add_write_lock, on a contended upgrade (one that was first put to sleep, then woken). The call chain is:

Upgrade::poll (else branch) → add_write_lock(self.1.1, Some(self.2)) → pre_acquire_assertion(&WaitType::SomeWrite(uuid), idx) → woken_or_new(Some(&WaitType::SomeWrite(uuid)), idx) // ↑ panics here via assert_eq!

Here's the reasoning step by step:

The Upgrade future has to wait (there's a conflicting read lock from another task), so it gets pushed to waiting as WaitType::Upgrade(uuid).

The conflict is released; wake() moves it from waiting to woken — stored as WaitType::Upgrade(uuid).

On the next poll(), should_wait() returns false because woken.contains_key(&idx) is true, so execution falls into the else (acquire) branch.

That branch calls add_write_lock(self.1.1, Some(self.2)), which internally calls pre_acquire_assertion(&WaitType::SomeWrite(uuid), idx).

pre_acquire_assertion calls woken_or_new(Some(&WaitType::SomeWrite(uuid)), idx).

Inside woken_or_new, woken.contains_key(&idx) is true, so it removes the entry (which is WaitType::Upgrade(uuid)) and asserts:
assert_eq!(Some(&WaitType::Upgrade(uuid)), Some(&WaitType::SomeWrite(uuid))); // ^^^ PANIC: mismatch

On a non-contended first-pass acquisition (idx never in woken), woken_or_new returns early without reaching the assert_eq!, so no panic occurs there. The bug only surfaces when an upgrade has to wait and is then woken.

🧠 Learnings used

Learnt from: mulkieran Repo: stratis-storage/stratisd PR: 3953 File: src/engine/strat_engine/keys.rs:35-41 Timestamp: 2026-03-20T23:01:09.032Z Learning: In Rust code that uses `Result<Option<T>>`, do not rely on the `?` operator to handle `Ok(None)` implicitly. `?` will treat `Ok(None)` as a successful value and may incorrectly skip required error handling. Instead, explicitly match on `Option` (e.g., `match res? { Some(v) => ..., None => return Err(/* missing key */) }`) so missing-key cases fail fast when they should.

Learnt from: mulkieran Repo: stratis-storage/stratisd PR: 3953 File: src/engine/strat_engine/backstore/blockdev/v1.rs:410-415 Timestamp: 2026-03-20T20:00:36.372Z Learning: In `src/engine/strat_engine/backstore/blockdev/v1.rs`, the `setup_reencrypt` and `do_reencrypt` methods intentionally use `.expect("Checked that pool is encrypted")` on `crypt_handle()` rather than returning a `StratisError`. The crypt handle is obtained programmatically and all block devices in the pool are guaranteed to be encrypted before these methods are called, making the `expect` a valid programmer assertion rather than a recoverable runtime error.

Learnt from: jbaublitz Repo: stratis-storage/stratisd PR: 3651 File: src/dbus_api/pool/pool_3_9/methods.rs:173-176 Timestamp: 2025-08-20T02:37:22.477Z Learning: The encrypt_pool, reencrypt_pool, and decrypt_pool methods in src/dbus_api/pool/pool_3_9/methods.rs have intentionally different lock handling patterns due to different workflow requirements, and this is not an inconsistency that needs to be fixed.

@jbaublitz I think this is worth checking as well.

Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!

mulkieran · 2026-03-16T20:56:59Z

src/dbus/pool/pool_3_9/methods.rs

+            return (
+                default_return,
+                DbusErrorEnum::ERROR as u16,
+                "No unlock methods provided".to_string(),


This reads a little oddly as an error string. Could you try something like "No encryption methods provided"?

mulkieran · 2026-03-20T17:55:12Z

src/engine/strat_engine/keys.rs

    search_key(keyring_id, key_desc.to_system_string())
 }

+/// Validate that all key descriptions


This is just a fragmentary sentence...

mulkieran · 2026-03-20T18:17:22Z

src/engine/strat_engine/serde_structs.rs

+fn serialize_date_time<S>(
+    timestamp: &Option<DateTime<Utc>>,
+    serializer: S,
+) -> Result<S::Ok, S::Error>
+where
+    S: Serializer,
+{
+    serializer.serialize_i64(timestamp.expect("is some").timestamp())
+}


The fact that it is logically entailed by our code justifies the expect. It is just the name that makes it seem more general. The function is private to this module.

src/dbus/pool/pool_3_9/mod.rs

mulkieran · 2026-03-20T19:39:54Z

src/engine/strat_engine/backstore/backstore/v1.rs

+    pub fn reencrypt(
+        &self,
+        pool_uuid: PoolUuid,
+        key_info: Vec<(u32, SizedKeyMemory, u32)>,
+    ) -> StratisResult<()> {
+        for (bd, (slot, key, new_slot)) in self
+            .blockdevs()
+            .into_iter()
+            .map(|(_, _, bd)| bd)
+            .zip(key_info)
+        {
+            bd.do_reencrypt(pool_uuid, slot, key, new_slot)?;
+        }
+        Ok(())


Ok. Let's do this one. The error hypothesized seems very unlikely, but the extra error return is very little additional trouble. If we're convinced the lengths must be the same, an assertion is the appropriate thing.

src/engine/strat_engine/backstore/blockdev/v1.rs

src/engine/strat_engine/cmd.rs

jbaublitz added this to the v3.9.0 milestone Jan 13, 2026

jbaublitz self-assigned this Jan 13, 2026

jbaublitz added the enhancement label Jan 13, 2026

jbaublitz force-pushed the issue-stratisd-3597 branch from 7407077 to bdb97bf Compare January 13, 2026 21:17

jbaublitz changed the title ~~Issue stratisd 3597~~ Online reencryption Jan 13, 2026

jbaublitz force-pushed the issue-stratisd-3597 branch 4 times, most recently from 8c42a7f to b7f74d4 Compare January 14, 2026 14:40

jbaublitz force-pushed the issue-stratisd-3597 branch 2 times, most recently from 97410b3 to b40697c Compare January 28, 2026 21:01

mulkieran added this to 2026January Jan 30, 2026

mulkieran moved this to In Review in 2026January Jan 30, 2026

jbaublitz force-pushed the issue-stratisd-3597 branch 2 times, most recently from 480911e to d7a7750 Compare February 9, 2026 16:59

mulkieran added this to 2026February Feb 17, 2026

mulkieran removed this from 2026January Feb 17, 2026

mulkieran moved this to In Review in 2026February Feb 17, 2026

mulkieran removed this from 2026February Mar 2, 2026

mulkieran added this to 2026March Mar 2, 2026

mulkieran moved this to In Review in 2026March Mar 2, 2026

jbaublitz force-pushed the issue-stratisd-3597 branch 3 times, most recently from dbd1d21 to f2fb731 Compare March 5, 2026 21:03

mulkieran mentioned this pull request Mar 9, 2026

Add cryptsetup Requires stratis-storage/ci#574

Open

mulkieran requested changes Mar 11, 2026

View reviewed changes

mulkieran moved this from In Review to In Progress in 2026March Mar 11, 2026

jbaublitz added 21 commits March 20, 2026 08:37

Add D-Bus interface for encrypting pool

52f1820

Unit test for online encryption

7b32ecd

Add reencryption support in both crypt handles

b9e0a64

Add Engine interface for reencrypting pool

cb2e300

Add D-Bus interface for reencrypting pool

b8a18e0

Unit test for online reencryption

438cc4c

Add decryption support in V2 crypt handle

7b87924

Add Engine interface for decrypting pool

655d22a

Add D-Bus interface for decrypting pool

e46773a

Unit test for online decryption

3a003bc

Perform rollback for setup operations in reencryption

7a3ccd7

Add last reencrypted timestamp to engine

fc3f5b0

Update timestamp on successful decryption and reencryption operations

4be5405

Add reencryption timestamp API to engine

be30e14

Add property to indicate when the pool was last reencrypted

bf395ea

Add change signal for last encrypted timestamp

6b32639

Refactor reencryption to use a read lock during operation

84fdda1

Refactor encryption to use a read lock during operation

13e4c2b

Refactor decryption to use a read lock during operation

df313c7

Add prechecks for all key description-related operations

56e7ed8

jbaublitz force-pushed the issue-stratisd-3597 branch from 42b20f1 to 573a13c Compare March 20, 2026 12:37

coderabbitai bot reviewed Mar 20, 2026

View reviewed changes

mulkieran reviewed Mar 20, 2026

View reviewed changes

mulkieran moved this from In Review to In Progress in 2026March Mar 20, 2026

Allow parallel encryption, reencryption and decryption operations

2d1689f

jbaublitz force-pushed the issue-stratisd-3597 branch from 573a13c to 2d1689f Compare March 24, 2026 13:16

Conversation

jbaublitz commented Jan 13, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

packit-as-a-service bot commented Jan 14, 2026

Uh oh!

mulkieran commented Mar 9, 2026

Uh oh!

mulkieran commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mulkieran left a comment

Choose a reason for hiding this comment

Uh oh!

mulkieran commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026

Uh oh!

coderabbitai bot commented Mar 20, 2026

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Mar 23, 2026

Choose a reason for hiding this comment

jbaublitz commented Jan 13, 2026 •

edited by coderabbitai bot

Loading

mulkieran commented Mar 9, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading

mulkieran Mar 23, 2026 •

edited

Loading

coderabbitai bot Mar 20, 2026 •

edited

Loading