Skip to content

Bound proof deserialization without format changes#1383

Open
quangvdao wants to merge 2 commits intoa16z:mainfrom
quangvdao:quang/upstream-bounded-proof-deserialization
Open

Bound proof deserialization without format changes#1383
quangvdao wants to merge 2 commits intoa16z:mainfrom
quangvdao:quang/upstream-bounded-proof-deserialization

Conversation

@quangvdao
Copy link
Copy Markdown
Contributor

Posted by Cursor assistant (model: GPT-5.4) on behalf of the user (Quang Dao) with approval.

Summary

  • add bounded manual deserialization helpers while preserving the existing wire format
  • bound attacker-controlled vectors across Jolt proof, sumcheck, BlindFold, Hyrax, and uni-poly types
  • reject duplicate claim keys, empty polynomials, and invalid MontU128Challenge values
  • add module-level regression coverage for malformed length prefixes and empty polynomial paths

Validation

  • cargo fmt -q
  • cargo clippy -p jolt-core --features host --message-format=short -q --all-targets -- -D warnings
  • cargo clippy -p jolt-core --features host,zk --message-format=short -q --all-targets -- -D warnings
  • cargo nextest run -p jolt-core claims_reject_ --cargo-quiet --features host
  • cargo nextest run -p jolt-core rejects_empty_unipoly_deserialization --cargo-quiet --features host
  • cargo nextest run -p jolt-core rejects_empty_compressed_unipoly_deserialization --cargo-quiet --features host
  • cargo nextest run -p jolt-core rejects_nonzero_lower_limbs_on_deserialize --cargo-quiet --features host
  • cargo nextest run -p jolt-core rejects_high_bits_outside_challenge_width --cargo-quiet --features host

Comment on lines +582 to +584
if self.coeffs_except_linear_term.len() == 1 {
return UniPoly::from_coeff(vec![self.coeffs_except_linear_term[0]]);
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical bug: Early return incorrectly produces degree-0 polynomial instead of degree-1. When coeffs_except_linear_term.len() == 1, the compressed polynomial contains only the constant term, but the decompressed polynomial must still include the linear term recovered from the hint.

What breaks: Verification will fail because the polynomial degree is wrong. A degree-1 polynomial gets decompressed as degree-0.

Fix:

if self.coeffs_except_linear_term.len() == 1 {
    let constant = self.coeffs_except_linear_term[0];
    let linear_term = *hint - constant - constant;
    return UniPoly::from_coeff(vec![constant, linear_term]);
}
Suggested change
if self.coeffs_except_linear_term.len() == 1 {
return UniPoly::from_coeff(vec![self.coeffs_except_linear_term[0]]);
}
if self.coeffs_except_linear_term.len() == 1 {
let constant = self.coeffs_except_linear_term[0];
let linear_term = *hint - constant - constant;
return UniPoly::from_coeff(vec![constant, linear_term]);
}

Spotted by Graphite

Fix in Graphite


Is this helpful? React 👍 or 👎 to let us know.

Copy link
Copy Markdown
Collaborator

@0xAndoroid 0xAndoroid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review with 10 findings across semantic consistency, bugs, tech debt, and security.

impl<F: JoltField> UniPoly<F> {
pub fn from_coeff(coeffs: Vec<F>) -> Self {
UniPoly { coeffs }
if coeffs.is_empty() {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 85] divide_with_remainder remainder bypasses the new non-empty invariant

This from_coeff change establishes the invariant that coeffs is never empty. However, divide_with_remainder (line 267) returns the remainder directly without going through from_coeff:

Some((Self::from_coeff(quotient), remainder))

The pop loop at line 263 can empty remainder.coeffs when division is exact. quotient is correctly wrapped via from_coeff, but remainder is not. A subsequent call to remainder.degree() hits the new debug_assert! in debug and underflows (0usize - 1 = usize::MAX) in release.

Also: is_zero() (line 271) still checks self.coeffs.is_empty() — dead branch under the new invariant but load-bearing until this is fixed.

Fix at line 267:

Some((Self::from_coeff(quotient), Self::from_coeff(remainder.coeffs)))

let round_commitments =
Vec::<C::G1>::deserialize_with_mode(&mut reader, compress, validate)?;
let poly_degrees = Vec::<usize>::deserialize_with_mode(&mut reader, compress, validate)?;
deserialize_bounded_vec(&mut reader, compress, validate, MAX_SUMCHECK_ROUNDS)?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 80] ZkSumcheckProof serialize/deserialize asymmetry

Every other struct in this PR pairs serialize_vec_with_len (write) with deserialize_bounded_vec (read). ZkSumcheckProof::serialize_with_mode (line 680-691) uses arkworks' default self.round_commitments.serialize_with_mode(...) for all three Vec fields, while deserialization here uses the new deserialize_bounded_vec.

Wire-compatible today (both write u64 length prefix), but inconsistent with the pattern in every other manual impl in this PR. The serialized_size (line 693-697) also uses arkworks default instead of serialized_vec_with_len_size.

Fix: Update the serialize side to use serialize_vec_with_len and serialized_vec_with_len_size for all three fields, matching the rest of the PR.

let output_claims_commitments =
Vec::<C::G1>::deserialize_with_mode(reader, compress, validate)?;
deserialize_bounded_vec(reader, compress, validate, MAX_OPENING_CLAIMS)?;
Ok(Self {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 75] Missing check() validation guard in ZkSumcheckProof::deserialize_with_mode

Every other manual CanonicalDeserialize impl in this PR calls:

if validate == Validate::Yes {
    proof.check()?;
}

after construction. This is the only struct that omits it — it returns Ok(Self { ... }) directly. The struct-level Valid::check() (line 703-707) is never invoked during deserialization.

Fix:

let proof = Self {
    round_commitments,
    poly_degrees,
    output_claims_commitments,
    _marker: PhantomData,
};
if validate == ark_serialize::Validate::Yes {
    proof.check()?;
}
Ok(proof)

compress,
validate,
)?,
joint_opening_proof: PCS::Proof::deserialize_with_mode(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 70] Dory opening proof (PCS::Proof) has unbounded Vec deserialization

This is the one proof component that completely escapes the new bounding infrastructure. In dory-pcs, ArkDoryProof::deserialize_with_mode reads a u32 num_rounds from the wire and allocates Vec::with_capacity(num_rounds) for first_messages and second_messages without any upper bound. A malicious prover can set num_rounds = u32::MAX, forcing multi-GB allocation before any verification.

Since this lives in an external crate, it can't be fixed here. But consider:

  1. A post-deserialization bounds check on proof.joint_opening_proof fields
  2. Filing an issue on dory-pcs to add bounded deserialization
  3. Adding a note in the commit/PR description about this gap

)?,
#[cfg(not(feature = "zk"))]
opening_claims: Claims::deserialize_with_mode(&mut reader, compress, validate)?,
trace_length: usize::deserialize_with_mode(&mut reader, compress, validate)?,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 65] trace_length and ram_K deserialized as raw usize with no bounds

trace_length = 0 causes a panic via assert_ne!(self, 0) in Math::log_2 during verification — a trivially exploitable remote crash. Very large values (e.g., 1 << 50) cause excessive allocations in UniformSpartanKey::new.

Consider adding bounds in Valid::check() or here:

const MAX_TRACE_LENGTH: usize = 1 << 32;
if self.trace_length == 0 || self.trace_length > MAX_TRACE_LENGTH {
    return Err(SerializationError::InvalidData);
}

Ok(values)
}

pub fn deserialize_bounded_u32_len<R: Read>(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 50] deserialize_bounded_u32_len is dead code

Defined but never called anywhere in the codebase. All deserialization sites use deserialize_bounded_vec or deserialize_bounded_len (both read usize). This will trigger a dead_code clippy warning.

Remove it, or if planned for future use, annotate with #[allow(dead_code)].

pub const MAX_OPENING_CLAIMS: usize = 1 << 16;
pub const MAX_SUMCHECK_ROUNDS: usize = 1 << 16;
pub const MAX_UNIPOLY_COEFFS: usize = 1 << 16;
pub const MAX_BLINDFOLD_VECTOR_LEN: usize = 1 << 20;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 50] MAX_BLINDFOLD_VECTOR_LEN = 1 << 20 is ~1000x larger than needed

1M elements × 32 bytes = 32MB per vector. This constant bounds 11 Vec fields across BlindFoldProof (6 fields), RelaxedR1CSInstance (5 fields), and RelaxedR1CSWitness (4 fields) — a malicious proof can force ~352MB+ total allocation. Realistic BlindFold proofs have row counts proportional to sumcheck rounds (a few hundred).

Consider tightening to 1 << 16 (65K) which is still generous. Also: the constants lack documentation explaining how each bound was derived — future developers won't know which bound to use for new fields.

pub fn degree(&self) -> usize {
self.coeffs_except_linear_term.len()
debug_assert!(!self.coeffs_except_linear_term.is_empty());
if self.coeffs_except_linear_term.len() == 1 {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 25] CompressedUniPoly::degree() returns 0 for len==1, underreporting degree-1 polynomials

When a degree-1 poly [c, b] is compressed, result is [c] (len=1). The new degree() returns 0, but the original had degree 1. A malicious compressed poly with 1 coeff trivially passes degree() > degree_bound checks.

In practice sumcheck rounds always have degree >= 2, so this can't trigger from an honest prover. But it weakens the defense-in-depth this PR is adding. The old behavior (coeffs_except_linear_term.len(), returning 1) was more conservative. Consider documenting this limitation.

@@ -36,6 +36,9 @@ pub struct MontU128Challenge<F: JoltField> {
// Custom serialization: serialize as [u64; 4] for compatibility with field element format
impl<F: JoltField> Valid for MontU128Challenge<F> {
fn check(&self) -> Result<(), SerializationError> {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Score: 25] >> 61 check and >> 3 mask express the same constraint differently

Validation: (self.high >> 61) != 0
Construction: u64::MAX >> 3 (masks top 3 bits)

Both enforce the top 3 bits of high are zero (61 usable bits), but expressed two different ways. A shared constant would make the equivalence explicit:

const CHALLENGE_HIGH_MASK: u64 = u64::MAX >> 3;
// in check(): self.high > CHALLENGE_HIGH_MASK

Nit — the validation is correct for the 125-bit challenge width (64-bit low + 61-bit high).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants