[Breaking Change] Normalize nil payload value encoding to empty slice#8307
[Breaking Change] Normalize nil payload value encoding to empty slice#8307zhangchiqing merged 7 commits intomasterfrom
Conversation
📝 WalkthroughWalkthroughIntroduces an internal serializer helper and updates MarshalJSON/MarshalCBOR to use it; nil payload Values are normalized to an empty Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
fc875a7 to
ff72b16
Compare
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
ledger/trie.go
Outdated
| // it first reads checkpoint files and then WAL files, during which nil values are normalized | ||
| // to []byte{}. By normalizing at payload creation time, we ensure all code paths produce | ||
| // consistent encoded data regardless of whether the original value was nil or []byte{}. | ||
| if value == nil { |
There was a problem hiding this comment.
-
Why not adding the normalization in UpdateToPayloads, where the trie updates are converted into payloads.?
This ensures payload created in other code path is also normalized. -
Would this change affect the mtrie algorithm?
No. The mtrie useslen(value)to determine if a register is unallocated. Becauselen(nil)equalslen([]byte{})(both are 0), normalizingnilto[]byte{}has no effect on the mtrie's logic.
also the Payload.ValueEquals method consider both nil and []byte{} are equal
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In @ledger/trie_encoder_test.go:
- Line 643: Update the test assertion message in the require.NotNil call that
checks tu.Payloads[0].Value() so the typo is fixed: replace "no-nil" with
"non-nil" in the assertion message string used in the require.NotNil(t,
tu.Payloads[0].Value(), "...") call.
🧹 Nitpick comments (1)
ledger/trie_test.go (1)
627-627: Consider clarifying the comment for better readability.The byte value
0x40is correct (CBOR encoding for an empty byte string), but the comment "null will be normalized to []byte{}" might be clearer as "empty byte string (normalized from nil)" or "bytes(0) - normalized empty value". The current phrasing could be misread as suggesting that 0x40 represents null, when it actually represents an empty byte string.✏️ Suggested comment improvement
- 0x40, // null will be normalized to []byte{} + 0x40, // bytes(0) - empty byte string (normalized from nil)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ledger/trie.goledger/trie_encoder_test.goledger/trie_test.go
🧰 Additional context used
📓 Path-based instructions (5)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in @docs/agents/CodingConventions.md
Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
ledger/trie.goledger/trie_encoder_test.goledger/trie_test.go
{crypto,fvm,ledger,storage}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Security checks for cryptographic misuse must be enforced using gosec
Files:
ledger/trie.goledger/trie_encoder_test.goledger/trie_test.go
{crypto,fvm,ledger,access,engine}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Files:
ledger/trie.goledger/trie_encoder_test.goledger/trie_test.go
{storage,ledger,execution,fvm}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
State consistency is paramount; use proper synchronization primitives
Files:
ledger/trie.goledger/trie_encoder_test.goledger/trie_test.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of*_test.gofiles for test naming
Use fixtures for realistic test data as defined in/utils/unittest/
Files:
ledger/trie_encoder_test.goledger/trie_test.go
🧬 Code graph analysis (1)
ledger/trie_encoder_test.go (2)
ledger/trie.go (1)
NewPayload(435-448)ledger/trie_encoder.go (2)
EncodeTrieUpdate(555-569)DecodeTrieUpdate(604-620)
🔇 Additional comments (2)
ledger/trie.go (1)
437-446: LGTM: Normalization ensures consistency across serialization paths.The nil-to-empty-slice normalization at payload creation time is the correct approach. This eliminates ambiguity in encoded data and aligns with the existing
ValueEqualsbehavior (lines 409-422) that already treats nil and empty payloads as equivalent. The detailed comment effectively explains the rationale for this change.ledger/trie_encoder_test.go (1)
620-663: LGTM: Comprehensive test coverage for normalization behavior.The test effectively verifies that nil and empty slice values are normalized to identical representations through the encode/decode cycle. The test structure is clear, with pre-encode assertions confirming normalization in
NewPayloadand post-decode assertions confirming persistence through serialization.
| require.NoError(t, err) | ||
|
|
||
| // Both will be []byte{} after decode due to normalization in NewPayload. | ||
| t.Logf("Decoded Payload 0 value: %v (isNil=%v)", decoded.Payloads[0].Value(), decoded.Payloads[0].Value() == nil) |
There was a problem hiding this comment.
If the normalization was not added, this would log as isNil=false, even if the Payloads[0] was created with NewPayload(k1, nil). This means after encoding and decoding, the nil value is converted to []byte{}, which proves EncodeTrieUpdate/DecodeTrieUpdate does not distinguish between nil and []byte{}.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…o leo/normalize-payload-value
ledger/trie.go
Outdated
| sp := serializablePayload{Key: k, Value: p.value} | ||
| // Normalize nil value to empty slice for consistent encoding | ||
| value := p.value | ||
| if value == nil { |
There was a problem hiding this comment.
This normalization acts as a defensive measure: if a Payload is instantiated directly as &Payload{} (skipping NewPayload()), the nil value normalization in the constructor is bypassed. Normalizing during encoding ensures all payloads serialize consistently, regardless of their creation path.
janezpodhostnik
left a comment
There was a problem hiding this comment.
Does this come with a performance penalty?
ledger/trie.go
Outdated
| // EmptyPayload returns an empty payload | ||
| func EmptyPayload() *Payload { | ||
| return &Payload{} | ||
| return &Payload{value: Value{}} |
There was a problem hiding this comment.
Is it ok that the key is nil in the empty payload?
There was a problem hiding this comment.
I moved the normalization to the encoding time.
fxamacker
left a comment
There was a problem hiding this comment.
LGTM. Please add "breaking change" label to this PR, and also mention that this PR causes breaking change in encoded data when encoding ledger.Payload with both JSON and CBOR.
| } | ||
|
|
||
| v := p.value | ||
| if v == nil { |
There was a problem hiding this comment.
I moved the normalization from payload creation time to encoding time, keeping the surface small.
There was a problem hiding this comment.
Does this come with a performance penalty?
After moving the normalization to encoding time, there should be no performance penalty. @janezpodhostnik
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
ledger/trie_test.go (2)
534-576: Fix CBOR comment (“0x40” isbytes(0), not null) + tighten nil/empty assertions.
0x40encodes an empty byte string, so “null will be normalized…” is misleading. Also, consider checkingp2.valueimmediately after unmarshal (before any accessor calls) if the goal is to validate decode-time normalization.Proposed patch
0x65, // text(5) 0x56, 0x61, 0x6c, 0x75, 0x65, // "Value" - 0x40, // null will be normalized to []byte{} + 0x40, // bytes(0) - nil Value is encoded as empty byte string } var p Payload b, err := cbor.Marshal(p) require.NoError(t, err) - require.True(t, p.value == nil) + require.Nil(t, p.value) // ensure marshal didn't mutate the original payload require.Equal(t, encoded, b) var p2 Payload err = cbor.Unmarshal(b, &p2) require.NoError(t, err) + // If the intent is to verify decode-time normalization (not via accessors), + // assert it immediately after unmarshal. + require.NotNil(t, p2.value) + require.Len(t, p2.value, 0) + // Reset b to make sure that p2 doesn't share underlying data with CBOR-encoded data. for i := 0; i < len(b); i++ { b[i] = 0 } @@ v2 := p2.Value() require.True(t, v2.Equals(Value{})) - - // after decoding, the value will be normalized to []byte{} - // which will be different from the original format - require.True(t, p.value == nil) - require.True(t, p2.value != nil) + require.Nil(t, p.value) + require.NotNil(t, p2.value)
632-639: Same CBOR comment issue in “empty value” case (0x40isbytes(0)).Proposed patch
0x65, // text(5) 0x56, 0x61, 0x6c, 0x75, 0x65, // "Value" - 0x40, // null will be normalized to []byte{} + 0x40, // bytes(0) }
🧹 Nitpick comments (1)
ledger/trie_test.go (1)
422-429: Make the “post-decode normalization” assertion unambiguous + use idiomaticrequirehelpers.Right now the
p2.value != nilcheck happens after callingIsEmpty()/Key()/Value(), so it can’t prove normalization happened during unmarshal (vs. inside an accessor). Also, preferrequire.Nil/NotNiloverrequire.True(x == nil).Proposed patch
v2 := p2.Value() require.True(t, v2.Equals(Value{})) - // after decoding, the value will be normalized to []byte{} - // which will be different from the original format - require.True(t, p.value == nil) - require.True(t, p2.value != nil) + // After unmarshal, Value is normalized to a non-nil empty slice ([]byte{}), + // which differs from the original zero-value representation (nil). + require.Nil(t, p.value) + require.NotNil(t, p2.value) + require.Len(t, p2.value, 0)
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
ledger/trie.goledger/trie_encoder_test.goledger/trie_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
- ledger/trie.go
- ledger/trie_encoder_test.go
🧰 Additional context used
📓 Path-based instructions (5)
**/*.go
📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)
Follow Go coding conventions as documented in @docs/agents/CodingConventions.md
Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md
**/*.go: Follow the existing module structure in/module/,/engine/,/model/and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: runmake generate-mocksafter interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoidfmt.Errorf, useirrecoverablepackage for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)
Files:
ledger/trie_test.go
**/*_test.go
📄 CodeRabbit inference engine (AGENTS.md)
**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of*_test.gofiles for test naming
Use fixtures for realistic test data as defined in/utils/unittest/
Files:
ledger/trie_test.go
{crypto,fvm,ledger,storage}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Security checks for cryptographic misuse must be enforced using gosec
Files:
ledger/trie_test.go
{crypto,fvm,ledger,access,engine}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation
Files:
ledger/trie_test.go
{storage,ledger,execution,fvm}/**/*.go
📄 CodeRabbit inference engine (AGENTS.md)
State consistency is paramount; use proper synchronization primitives
Files:
ledger/trie_test.go
🧬 Code graph analysis (1)
ledger/trie_test.go (1)
ledger/trie.go (1)
Payload(250-258)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
- GitHub Check: Lint (./insecure/)
- GitHub Check: Lint (./integration/)
- GitHub Check: Lint (./)
- GitHub Check: Unit Tests (engine/execution/computation)
- GitHub Check: Unit Tests (network)
- GitHub Check: Unit Tests (module)
- GitHub Check: Unit Tests (state)
- GitHub Check: Unit Tests (engine/collection)
- GitHub Check: Unit Tests (network/p2p/connection)
- GitHub Check: Unit Tests (network/p2p/scoring)
- GitHub Check: Unit Tests (network/test/cohort2)
- GitHub Check: Unit Tests (others)
- GitHub Check: Unit Tests (network/alsp)
- GitHub Check: Unit Tests (fvm)
- GitHub Check: Unit Tests (network/p2p/node)
- GitHub Check: Unit Tests (engine/verification)
- GitHub Check: Unit Tests (engine/execution)
- GitHub Check: Unit Tests (network/test/cohort1)
- GitHub Check: Unit Tests (engine/execution/ingestion)
- GitHub Check: Unit Tests (module/dkg)
- GitHub Check: Unit Tests (consensus)
- GitHub Check: Unit Tests (engine)
- GitHub Check: Unit Tests (engine/common)
- GitHub Check: Unit Tests (ledger)
- GitHub Check: Unit Tests (engine/consensus)
- GitHub Check: Unit Tests (engine/access)
- GitHub Check: Unit Tests (utils)
- GitHub Check: Unit Tests (network/p2p)
- GitHub Check: Unit Tests (admin)
- GitHub Check: Unit Tests (storage)
- GitHub Check: Unit Tests (cmd)
- GitHub Check: Integration Tests Others (integration)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
- GitHub Check: Unit Tests Insecure (insecure)
- GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
- GitHub Check: Docker Build
- GitHub Check: analyze-code (go)
|
|
||
| // Value returns payload value. | ||
| // CAUTION: do not modify returned value because it shares underlying data with payload value. | ||
| // CAUTION: to check wheather the payload value is empty, use len(payload.Value()) == 0, |
There was a problem hiding this comment.
I have checked there is no incompatible usage of checking with payload.value == nil or payload.Value() == nil
|
@zhangchiqing - This is breaking change as-in it requires an HCU to deploy but not a breaking change in the sense that it impacts the Access API, or how clients interact with the chain, correct? |
|
Right, It's a breaking change because it requires HCU to deploy, but it's not a breaking change for Access API, the old access client can still be able to retrieve the same data. |
Working towards #8308
Background
When working on a proof of concept for #8308, I found a non-determinism issue causing a result_id mismatch. The root cause was an ambiguity in EncodeTrieUpdate and DecodeTrieUpdate, which do not distinguish between
niland[]byte{}(empty slice) payload values; both are decoded as[]byte{}. I have the test caseTestTrieUpdateNilVsEmptySlice(included in the PR) to reproduce it.This became problematic because block execution sometimes produces a mix of both types. When these values were passed through the remote service’s gRPC/binary boundary, the distinction was lost. However, the CBOR serialization for ChunkExecutionData is sensitive to this difference, resulting in a hash mismatch in execution data id and an incorrect result_id.
Solution
This PR fixes the issue by adding a normalization step in NewPayload to convert nil values to empty slices ([]byte{}). This works because:
ValueEqualsalready treatsniland empty payloads as equivalent by usingIsEmpty(), the codebase already considers them semantically equivalent.Note
This is a breaking change that requires HCU to deploy.
Summary by CodeRabbit
Bug Fixes
Tests
✏️ Tip: You can customize this high-level summary in your review settings.