Skip to content

[v25.2.x] kafka: add kafka_produce_batch_validation and rework timestamp validation (MANUAL BACKPORT)#27599

Merged
WillemKauf merged 5 commits intoredpanda-data:v25.2.xfrom
WillemKauf:kafka_batch_validation_backport
Sep 17, 2025
Merged

[v25.2.x] kafka: add kafka_produce_batch_validation and rework timestamp validation (MANUAL BACKPORT)#27599
WillemKauf merged 5 commits intoredpanda-data:v25.2.xfrom
WillemKauf:kafka_batch_validation_backport

Conversation

@WillemKauf
Copy link
Copy Markdown
Contributor

Manual backport of #27529. cherry-pick conflicts due to lack of #27419 in v25.2.x.

Because of the lack of the above PR and message.timestamp.{before/after}.max.ms, we still rely on alerts (log_message_timestamp_alert_{before/after}_ms) rather than outright rejecting messages.

Fixes #27573

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v25.2.x
  • v25.1.x
  • v24.3.x

Release Notes

Features

  • Adds a new cluster config kafka_produce_batch_validation which controls the level of validation performed on batches in the redpanda produce path.

```
Cluster config name: `kafka_produce_batch_validation`

Description:

Controls the amount of validation that happens for produced batches in
the Kafka API. See the below descriptions for the behavior of each of these modes.

Type enum:

* `legacy`: In legacy mode we do the exact same behavior as we do today in
  Redpanda 25.2.x - there is minimal validation on the produce path of
  crc + uncompressed record iteration.

* `relaxed`: The new default mode that does full validation of uncompressed
  batches, this includes setting the max_timestamp. For compressed batches,
  it does only crc validation EXCEPT if the max_timestamp is missing. In
  the missing max_timestamp case we decompress the batch to compute the
  max_timestamp and do full validation.

* `strict`: Always do full validation of compressed and uncompressed batches.
  This should be the default in environments where producing clients are not trusted.

Default: `relaxed`
```

(cherry picked from commit 146f1b1)
@WillemKauf WillemKauf requested review from a team as code owners September 16, 2025 21:07
@WillemKauf WillemKauf requested review from andrewhsu and removed request for a team September 16, 2025 21:07
{.batch = *batch,
.timestamp_type = cfg_ctx.timestamp_type,
.message_timestamp_alert_before_ms
= config::shard_local_cfg().log_message_timestamp_alert_before_ms,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note: this PR is using alerts rather than outright rejection of batches with message.timestamp.{before/after}.max.ms

@WillemKauf WillemKauf force-pushed the kafka_batch_validation_backport branch 3 times, most recently from 6fb2bed to 26ecc26 Compare September 16, 2025 21:33
bharathv and others added 4 commits September 16, 2025 18:02
This commit reworks and adds several new batch/record level validation functions
to the Kafka produce path in `produce_validation.{cc/h}`.

`redpanda` cannot leave `max_timestamp` unset in record batches - several
subsystems (timequeries, retention, archival) depend heavily on the `max_timestamp`
of batches being properly set.

Set the `max_timestamp` manually by iterating over the records in the
record batch, taking the maximum `timestamp_delta` and adding it to the
batch's `first_timestamp` (which _must_ be set by clients, as a hard rule).
This may incur additional cost for compressed batches, and is gated by the
level of `kafka_produce_batch_validation`.

Log lines are added to alert the client in case of a batch accepted without a
maximum timestamp set for `kafka_produce_batch_validation == legacy` or in the case of
expensive decompression operation in the produce path for
`kafka_produce_batch_validation == relaxed`, letting them know to update
their client to set the `max_timestamp`.

(cherry picked from commit 7f148e9)
Tests that records/batches produced with specific timestamps, `redpanda`
validation modes and `compression.type`s have the same behavior with
a Kafka broker.

(cherry picked from commit 58731ae)
This test case uses an old version of `sarama` which does NOT properly
set the max_timestamp in a batch.

This version is useful for testing redpanda behavior with improper timestamps
and different validation modes.

(cherry picked from commit c87baf0)
@WillemKauf WillemKauf force-pushed the kafka_batch_validation_backport branch from 26ecc26 to 35eb4b0 Compare September 16, 2025 22:02
@WillemKauf
Copy link
Copy Markdown
Contributor Author

WillemKauf commented Sep 16, 2025

Please let this be the last force push.

Lots of missing stuff from v25.2 (pre model::batch_compression, a cherry-picked commit for utility in produce consume utils, other differences in kafka testing produce/consume layer)

@WillemKauf WillemKauf merged commit 8af65a7 into redpanda-data:v25.2.x Sep 17, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants