Skip to content

[Breaking Change] Normalize nil payload value encoding to empty slice#8307

Merged
zhangchiqing merged 7 commits intomasterfrom
leo/normalize-payload-value
Jan 12, 2026
Merged

[Breaking Change] Normalize nil payload value encoding to empty slice#8307
zhangchiqing merged 7 commits intomasterfrom
leo/normalize-payload-value

Conversation

@zhangchiqing
Copy link
Member

@zhangchiqing zhangchiqing commented Jan 7, 2026

Working towards #8308

Background

When working on a proof of concept for #8308, I found a non-determinism issue causing a result_id mismatch. The root cause was an ambiguity in EncodeTrieUpdate and DecodeTrieUpdate, which do not distinguish between nil and []byte{} (empty slice) payload values; both are decoded as []byte{}. I have the test case TestTrieUpdateNilVsEmptySlice (included in the PR) to reproduce it.

This became problematic because block execution sometimes produces a mix of both types. When these values were passed through the remote service’s gRPC/binary boundary, the distinction was lost. However, the CBOR serialization for ChunkExecutionData is sensitive to this difference, resulting in a hash mismatch in execution data id and an incorrect result_id.

Solution

This PR fixes the issue by adding a normalization step in NewPayload to convert nil values to empty slices ([]byte{}). This works because:

  • Existing code treats them as equal: The existing implementation of ValueEquals already treats nil and empty payloads as equivalent by using IsEmpty(), the codebase already considers them semantically equivalent.
  • Consistency across serialization formats: By normalizing at payload creation time, we ensure all code paths produce consistent encoded data regardless of whether the original value was nil or []byte{}. This eliminates the distinction that was causing hash mismatches in execution data serialization.

Note

This is a breaking change that requires HCU to deploy.

Summary by CodeRabbit

  • Bug Fixes

    • Normalize nil and empty payload values so they consistently serialize as empty byte strings across JSON and CBOR, avoiding mismatches between nil vs empty representations.
  • Tests

    • Added and updated tests to verify nil and empty payload values normalize and remain equivalent after encode/decode.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 7, 2026

📝 Walkthrough

Walkthrough

Introduces an internal serializer helper and updates MarshalJSON/MarshalCBOR to use it; nil payload Values are normalized to an empty Value{} during serialization. Tests updated and a new test added to assert that nil and empty byte-slice Values encode/decode as identical empty byte strings.

Changes

Cohort / File(s) Summary
Serializer & Normalization
ledger/trie.go
Added internal seralizable() helper on Payload; MarshalJSON and MarshalCBOR now delegate to it. Normalizes nil ValueValue{} so encoding uses an empty byte string instead of null.
New encoder test
ledger/trie_encoder_test.go
Added TestTrieUpdateNilVsEmptySlice to construct trie updates with nil vs empty Value, encode/decode them, and assert both normalize to identical empty byte slices after round-trip.
Existing serialization tests updated
ledger/trie_test.go
Updated JSON and CBOR tests to expect normalized empty byte string (0x40) for empty Value and added assertions that decoded payloads have non-nil empty byte slices while original may remain nil.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I found two bytes, one nil, one neat,
I nudged them close until they meet,
Encoded small, no null between,
They hop as one—both calm and clean,
A little trie dance, joyful and fleet.

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 66.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: normalizing nil payload values to empty slices during encoding, which is the central modification across all three affected test files.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Contributor

github-actions bot commented Jan 7, 2026

Dependency Review

✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.

Scanned Files

None

@zhangchiqing zhangchiqing force-pushed the leo/normalize-payload-value branch from fc875a7 to ff72b16 Compare January 7, 2026 17:13
@codecov-commenter
Copy link

codecov-commenter commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
ledger/trie.go 80.00% 1 Missing and 1 partial ⚠️

📢 Thoughts on this report? Let us know!

ledger/trie.go Outdated
// it first reads checkpoint files and then WAL files, during which nil values are normalized
// to []byte{}. By normalizing at payload creation time, we ensure all code paths produce
// consistent encoded data regardless of whether the original value was nil or []byte{}.
if value == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why not adding the normalization in UpdateToPayloads, where the trie updates are converted into payloads.?
    This ensures payload created in other code path is also normalized.

  2. Would this change affect the mtrie algorithm?
    No. The mtrie uses len(value) to determine if a register is unallocated. Because len(nil) equals len([]byte{}) (both are 0), normalizing nil to []byte{} has no effect on the mtrie's logic.

also the Payload.ValueEquals method consider both nil and []byte{} are equal

@zhangchiqing zhangchiqing marked this pull request as ready for review January 7, 2026 17:51
@zhangchiqing zhangchiqing requested a review from a team as a code owner January 7, 2026 17:51
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @ledger/trie_encoder_test.go:
- Line 643: Update the test assertion message in the require.NotNil call that
checks tu.Payloads[0].Value() so the typo is fixed: replace "no-nil" with
"non-nil" in the assertion message string used in the require.NotNil(t,
tu.Payloads[0].Value(), "...") call.
🧹 Nitpick comments (1)
ledger/trie_test.go (1)

627-627: Consider clarifying the comment for better readability.

The byte value 0x40 is correct (CBOR encoding for an empty byte string), but the comment "null will be normalized to []byte{}" might be clearer as "empty byte string (normalized from nil)" or "bytes(0) - normalized empty value". The current phrasing could be misread as suggesting that 0x40 represents null, when it actually represents an empty byte string.

✏️ Suggested comment improvement
-			0x40, // null will be normalized to []byte{}
+			0x40, // bytes(0) - empty byte string (normalized from nil)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 757836b and ff72b16.

📒 Files selected for processing (3)
  • ledger/trie.go
  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
🧰 Additional context used
📓 Path-based instructions (5)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • ledger/trie.go
  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
{crypto,fvm,ledger,storage}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Security checks for cryptographic misuse must be enforced using gosec

Files:

  • ledger/trie.go
  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
{crypto,fvm,ledger,access,engine}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Files:

  • ledger/trie.go
  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
{storage,ledger,execution,fvm}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

State consistency is paramount; use proper synchronization primitives

Files:

  • ledger/trie.go
  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of *_test.go files for test naming
Use fixtures for realistic test data as defined in /utils/unittest/

Files:

  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
🧬 Code graph analysis (1)
ledger/trie_encoder_test.go (2)
ledger/trie.go (1)
  • NewPayload (435-448)
ledger/trie_encoder.go (2)
  • EncodeTrieUpdate (555-569)
  • DecodeTrieUpdate (604-620)
🔇 Additional comments (2)
ledger/trie.go (1)

437-446: LGTM: Normalization ensures consistency across serialization paths.

The nil-to-empty-slice normalization at payload creation time is the correct approach. This eliminates ambiguity in encoded data and aligns with the existing ValueEquals behavior (lines 409-422) that already treats nil and empty payloads as equivalent. The detailed comment effectively explains the rationale for this change.

ledger/trie_encoder_test.go (1)

620-663: LGTM: Comprehensive test coverage for normalization behavior.

The test effectively verifies that nil and empty slice values are normalized to identical representations through the encode/decode cycle. The test structure is clear, with pre-encode assertions confirming normalization in NewPayload and post-decode assertions confirming persistence through serialization.

require.NoError(t, err)

// Both will be []byte{} after decode due to normalization in NewPayload.
t.Logf("Decoded Payload 0 value: %v (isNil=%v)", decoded.Payloads[0].Value(), decoded.Payloads[0].Value() == nil)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the normalization was not added, this would log as isNil=false, even if the Payloads[0] was created with NewPayload(k1, nil). This means after encoding and decoding, the nil value is converted to []byte{}, which proves EncodeTrieUpdate/DecodeTrieUpdate does not distinguish between nil and []byte{}.

zhangchiqing and others added 3 commits January 7, 2026 10:14
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
ledger/trie.go Outdated
sp := serializablePayload{Key: k, Value: p.value}
// Normalize nil value to empty slice for consistent encoding
value := p.value
if value == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This normalization acts as a defensive measure: if a Payload is instantiated directly as &Payload{} (skipping NewPayload()), the nil value normalization in the constructor is bypassed. Normalizing during encoding ensures all payloads serialize consistently, regardless of their creation path.

Copy link
Contributor

@janezpodhostnik janezpodhostnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this come with a performance penalty?

ledger/trie.go Outdated
// EmptyPayload returns an empty payload
func EmptyPayload() *Payload {
return &Payload{}
return &Payload{value: Value{}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok that the key is nil in the empty payload?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the normalization to the encoding time.

Copy link
Member

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Please add "breaking change" label to this PR, and also mention that this PR causes breaking change in encoded data when encoding ledger.Payload with both JSON and CBOR.

@zhangchiqing zhangchiqing changed the title Normalize nil payload value to empty slice [Breaking Change] Normalize nil payload value to empty slice Jan 9, 2026
@zhangchiqing zhangchiqing changed the title [Breaking Change] Normalize nil payload value to empty slice [Breaking Change] Normalize nil payload value encoding to empty slice Jan 9, 2026
}

v := p.value
if v == nil {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the normalization from payload creation time to encoding time, keeping the surface small.

Copy link
Member Author

@zhangchiqing zhangchiqing Jan 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this come with a performance penalty?

After moving the normalization to encoding time, there should be no performance penalty. @janezpodhostnik

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
ledger/trie_test.go (2)

534-576: Fix CBOR comment (“0x40” is bytes(0), not null) + tighten nil/empty assertions.

0x40 encodes an empty byte string, so “null will be normalized…” is misleading. Also, consider checking p2.value immediately after unmarshal (before any accessor calls) if the goal is to validate decode-time normalization.

Proposed patch
 			0x65,                         // text(5)
 			0x56, 0x61, 0x6c, 0x75, 0x65, // "Value"
-			0x40, // null will be normalized to []byte{}
+			0x40, // bytes(0) - nil Value is encoded as empty byte string
 		}
 
 		var p Payload
 		b, err := cbor.Marshal(p)
 		require.NoError(t, err)
-		require.True(t, p.value == nil)
+		require.Nil(t, p.value) // ensure marshal didn't mutate the original payload
 		require.Equal(t, encoded, b)
 
 		var p2 Payload
 		err = cbor.Unmarshal(b, &p2)
 		require.NoError(t, err)
 
+		// If the intent is to verify decode-time normalization (not via accessors),
+		// assert it immediately after unmarshal.
+		require.NotNil(t, p2.value)
+		require.Len(t, p2.value, 0)
+
 		// Reset b to make sure that p2 doesn't share underlying data with CBOR-encoded data.
 		for i := 0; i < len(b); i++ {
 			b[i] = 0
 		}
@@
 		v2 := p2.Value()
 		require.True(t, v2.Equals(Value{}))
-
-		// after decoding, the value will be normalized to []byte{}
-		// which will be different from the original format
-		require.True(t, p.value == nil)
-		require.True(t, p2.value != nil)
+		require.Nil(t, p.value)
+		require.NotNil(t, p2.value)

632-639: Same CBOR comment issue in “empty value” case (0x40 is bytes(0)).

Proposed patch
 			0x65,                         // text(5)
 			0x56, 0x61, 0x6c, 0x75, 0x65, // "Value"
-			0x40, // null will be normalized to []byte{}
+			0x40, // bytes(0)
 		}
🧹 Nitpick comments (1)
ledger/trie_test.go (1)

422-429: Make the “post-decode normalization” assertion unambiguous + use idiomatic require helpers.

Right now the p2.value != nil check happens after calling IsEmpty()/Key()/Value(), so it can’t prove normalization happened during unmarshal (vs. inside an accessor). Also, prefer require.Nil/NotNil over require.True(x == nil).

Proposed patch
 		v2 := p2.Value()
 		require.True(t, v2.Equals(Value{}))
 
-		// after decoding, the value will be normalized to []byte{}
-		// which will be different from the original format
-		require.True(t, p.value == nil)
-		require.True(t, p2.value != nil)
+		// After unmarshal, Value is normalized to a non-nil empty slice ([]byte{}),
+		// which differs from the original zero-value representation (nil).
+		require.Nil(t, p.value)
+		require.NotNil(t, p2.value)
+		require.Len(t, p2.value, 0)
📜 Review details

Configuration used: defaults

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d57c6a6 and d49ab59.

📒 Files selected for processing (3)
  • ledger/trie.go
  • ledger/trie_encoder_test.go
  • ledger/trie_test.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • ledger/trie.go
  • ledger/trie_encoder_test.go
🧰 Additional context used
📓 Path-based instructions (5)
**/*.go

📄 CodeRabbit inference engine (.cursor/rules/coding_conventions.mdc)

Follow Go coding conventions as documented in @docs/agents/CodingConventions.md

Follow Go coding standards and conventions as documented in @docs/agents/GoDocs.md

**/*.go: Follow the existing module structure in /module/, /engine/, /model/ and use dependency injection patterns for component composition
Implement proper interfaces before concrete types
Follow Go naming conventions and the project's coding style defined in /docs/CodingConventions.md
Use mock generators: run make generate-mocks after interface changes
All inputs must be considered potentially byzantine; error classification is context-dependent and no code path is safe unless explicitly proven and documented
Use comprehensive error wrapping for debugging; avoid fmt.Errorf, use irrecoverable package for exceptions
NEVER log and continue on best effort basis; ALWAYS explicitly handle errors
Uses golangci-lint with custom configurations (.golangci.yml) and custom linters for Flow-specific conventions (struct write checking)

Files:

  • ledger/trie_test.go
**/*_test.go

📄 CodeRabbit inference engine (AGENTS.md)

**/*_test.go: Unit tests should be co-located with the code they test
Follow the existing pattern of *_test.go files for test naming
Use fixtures for realistic test data as defined in /utils/unittest/

Files:

  • ledger/trie_test.go
{crypto,fvm,ledger,storage}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Security checks for cryptographic misuse must be enforced using gosec

Files:

  • ledger/trie_test.go
{crypto,fvm,ledger,access,engine}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

Cryptographic operations require careful handling; refer to crypto library documentation for proper implementation

Files:

  • ledger/trie_test.go
{storage,ledger,execution,fvm}/**/*.go

📄 CodeRabbit inference engine (AGENTS.md)

State consistency is paramount; use proper synchronization primitives

Files:

  • ledger/trie_test.go
🧬 Code graph analysis (1)
ledger/trie_test.go (1)
ledger/trie.go (1)
  • Payload (250-258)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (37)
  • GitHub Check: Lint (./insecure/)
  • GitHub Check: Lint (./integration/)
  • GitHub Check: Lint (./)
  • GitHub Check: Unit Tests (engine/execution/computation)
  • GitHub Check: Unit Tests (network)
  • GitHub Check: Unit Tests (module)
  • GitHub Check: Unit Tests (state)
  • GitHub Check: Unit Tests (engine/collection)
  • GitHub Check: Unit Tests (network/p2p/connection)
  • GitHub Check: Unit Tests (network/p2p/scoring)
  • GitHub Check: Unit Tests (network/test/cohort2)
  • GitHub Check: Unit Tests (others)
  • GitHub Check: Unit Tests (network/alsp)
  • GitHub Check: Unit Tests (fvm)
  • GitHub Check: Unit Tests (network/p2p/node)
  • GitHub Check: Unit Tests (engine/verification)
  • GitHub Check: Unit Tests (engine/execution)
  • GitHub Check: Unit Tests (network/test/cohort1)
  • GitHub Check: Unit Tests (engine/execution/ingestion)
  • GitHub Check: Unit Tests (module/dkg)
  • GitHub Check: Unit Tests (consensus)
  • GitHub Check: Unit Tests (engine)
  • GitHub Check: Unit Tests (engine/common)
  • GitHub Check: Unit Tests (ledger)
  • GitHub Check: Unit Tests (engine/consensus)
  • GitHub Check: Unit Tests (engine/access)
  • GitHub Check: Unit Tests (utils)
  • GitHub Check: Unit Tests (network/p2p)
  • GitHub Check: Unit Tests (admin)
  • GitHub Check: Unit Tests (storage)
  • GitHub Check: Unit Tests (cmd)
  • GitHub Check: Integration Tests Others (integration)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/scoring)
  • GitHub Check: Unit Tests Insecure (insecure)
  • GitHub Check: Unit Tests Insecure (insecure/integration/functional/test/gossipsub/rpc_inspector)
  • GitHub Check: Docker Build
  • GitHub Check: analyze-code (go)


// Value returns payload value.
// CAUTION: do not modify returned value because it shares underlying data with payload value.
// CAUTION: to check wheather the payload value is empty, use len(payload.Value()) == 0,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked there is no incompatible usage of checking with payload.value == nil or payload.Value() == nil

@zhangchiqing zhangchiqing added this pull request to the merge queue Jan 12, 2026
Merged via the queue into master with commit c9e3915 Jan 12, 2026
117 of 118 checks passed
@zhangchiqing zhangchiqing deleted the leo/normalize-payload-value branch January 12, 2026 18:08
@coderabbitai coderabbitai bot mentioned this pull request Jan 14, 2026
@vishalchangrani
Copy link
Contributor

@zhangchiqing - This is breaking change as-in it requires an HCU to deploy but not a breaking change in the sense that it impacts the Access API, or how clients interact with the chain, correct?

@zhangchiqing
Copy link
Member Author

Right, It's a breaking change because it requires HCU to deploy, but it's not a breaking change for Access API, the old access client can still be able to retrieve the same data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants