Skip to content

Fix InvocationError metadata lost during protobuf deserialization of ResponseResult#4509

Open
tillrohrmann wants to merge 1 commit intorestatedev:mainfrom
tillrohrmann:issues/4508
Open

Fix InvocationError metadata lost during protobuf deserialization of ResponseResult#4509
tillrohrmann wants to merge 1 commit intorestatedev:mainfrom
tillrohrmann:issues/4508

Conversation

@tillrohrmann
Copy link
Contributor

The TryFrom implementation for deserializing a ResponseResult from protobuf storage was constructing an InvocationError with only code and message, silently dropping the failure_metadata field. The serialization direction (From) correctly included failure_metadata, so metadata was written to storage but lost on read.

This surfaces in multi-node clusters when an idempotent invocation completes on one node and the result is later read back from storage on another. For example, in the threeNodes/UserErrors test: the ingress on N1 submits callTerminallyFailingCall, but partition 0 is reassigned to N3 mid-flight. N3 replays the log, completes the invocation, and persists the completed status (with metadata) to RocksDB. When N1's ingress retries (using the idempotency key), N3's partition processor finds InvocationStatus::Completed in storage and returns the stored response_result — which goes through the buggy TryFrom deserialization, stripping the metadata before it reaches the ingress.

This fixes #4508.

…ResponseResult

The TryFrom<ResponseResult> implementation for deserializing a
ResponseResult from protobuf storage was constructing an InvocationError
with only code and message, silently dropping the failure_metadata field.
The serialization direction (From<ResponseResult>) correctly included
failure_metadata, so metadata was written to storage but lost on read.

This surfaces in multi-node clusters when an idempotent invocation
completes on one node and the result is later read back from storage on
another. For example, in the threeNodes/UserErrors test: the ingress on
N1 submits callTerminallyFailingCall, but partition 0 is reassigned to
N3 mid-flight. N3 replays the log, completes the invocation, and
persists the completed status (with metadata) to RocksDB. When N1's
ingress retries (using the idempotency key), N3's partition processor
finds InvocationStatus::Completed in storage and returns the stored
response_result — which goes through the buggy TryFrom<ResponseResult>
deserialization, stripping the metadata before it reaches the ingress.

This fixes restatedev#4508.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

threeNodes => UserErrors => Test propagate failure from another service with metadata fails on CI

1 participant