`config`: add support for `message.timestamp.{before/after}.max.ms` by WillemKauf · Pull Request #27419 · redpanda-data/redpanda

WillemKauf · 2025-08-29T00:02:22Z

Introduces message.timestamp.{before/after}.max.ms and log_message_timestamp_{before/after}_max_ms at the topic and cluster level.

These can be used to reject records produced with timestamps outside the set bounds by returning Kafka error code 32 (INVALID_TIMESTAMP).

Backports Required

Release Notes

Features

Adds support for KIP-937 by implementing message.timestamp.{before/after}.max.ms.
Deprecates log_message_timestamp_alert_{before/after}_ms cluster properties.

vbotbuildovich · 2025-08-29T04:13:12Z

Retry command for Build#71583

please wait until all jobs are finished before running the slash command

/ci-repeat 1
tests/rptest/tests/retention_policy_test.py::BogusTimestampTest.test_bogus_timestamps@{"mixed_timestamps":true,"use_broker_timestamps":false}
tests/rptest/tests/retention_policy_test.py::BogusTimestampTest.test_bogus_timestamps@{"mixed_timestamps":true,"use_broker_timestamps":true}
tests/rptest/tests/retention_policy_test.py::BogusTimestampTest.test_bogus_timestamps@{"mixed_timestamps":false,"use_broker_timestamps":false}
tests/rptest/tests/retention_policy_test.py::BogusTimestampTest.test_bogus_timestamps@{"mixed_timestamps":false,"use_broker_timestamps":true}
tests/rptest/tests/compatibility/kafka_streams_test.py::KafkaStreamsSessionWindow.test_kafka_streams

vbotbuildovich · 2025-08-29T05:14:23Z

CI test results

test results on build#71583

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
KafkaStreamsSessionWindow	test_kafka_streams	null	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cb-96c5-4af5-9987-524f78dd0b93	FAIL	0/21	The test has failed across all retries
KafkaStreamsSessionWindow	test_kafka_streams	null	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cd-2b65-4142-b267-4ab9a2f78c94	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": false, "use_broker_timestamps": false}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cb-96c4-4b4d-97d3-f1b6294dbddb	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": false, "use_broker_timestamps": false}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cd-2b65-4b54-8361-feb77bda5753	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": false, "use_broker_timestamps": true}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cb-96c5-4af5-9987-524f78dd0b93	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": false, "use_broker_timestamps": true}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cd-2b65-4142-b267-4ab9a2f78c94	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": true, "use_broker_timestamps": false}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cb-96be-4c38-9586-77501cea24d5	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": true, "use_broker_timestamps": false}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cd-2b5f-4696-bbd0-e2789096dec7	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": true, "use_broker_timestamps": true}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cb-96bf-4220-b0da-3eb6987c0546	FAIL	0/21	The test has failed across all retries
BogusTimestampTest	test_bogus_timestamps	{"mixed_timestamps": true, "use_broker_timestamps": true}	integration	https://buildkite.com/redpanda/redpanda/builds/71583#0198f3cd-2b60-4e1d-b037-f39be9a5a603	FAIL	0/21	The test has failed across all retries

test results on build#71645

test_class	test_method	test_arguments	test_kind	job_url	test_status	passed	reason
RandomNodeOperationsTest	test_node_operations	{"cloud_storage_type": 1, "compaction_mode": "sliding_window", "enable_failures": false, "mixed_versions": true, "with_iceberg": false}	integration	https://buildkite.com/redpanda/redpanda/builds/71645#0198f7dd-7cd2-4cb0-be78-28f42a0c41aa	FLAKY	20/21	upstream reliability is '99.13419913419914'. current run reliability is '95.23809523809523'. drift is 3.8961 and the allowed drift is set to 50. The test should PASS

rockwotj

Nice, just a few small bits on the validation itself (also CI failures)

rockwotj · 2025-08-29T15:41:41Z

src/v/kafka/server/handlers/produce.cc

-        return std::nullopt;
+        batch.set_max_timestamp(
+          model::timestamp_type::append_time, broker_time);
+    }


Missing return

Oops, this disappeared, thanks

rockwotj · 2025-08-29T16:11:18Z

src/v/kafka/server/handlers/produce.cc

+    auto broker_timepoint = model::duration_since_epoch(broker_time);
+
+    // reject if first_timestamp is too far in the past
+    auto first_timepoint = model::duration_since_epoch(header.first_timestamp);
+    if (
+      broker_timepoint > first_timepoint
+      && std::chrono::duration_cast<std::chrono::milliseconds>(
+           broker_timepoint - first_timepoint)
+           > message_timestamp_before_max_ms) {


Remind me why this check can't be:

auto min_timepoint = broker_time - message_timestamp_before_max_ms; if (first_timestamp < min_timepoint) { // error }

Is it something about overflow?

Yes, message_timestamp_before_max_ms is a bounded property that defaults to serde::max_serializable_ms (9223372036854), so subtracting this value from broker time would be an underflow (until Friday, April 11, 2262 11:47:16.854 PM 😄 )

~~The checks here are safe to under/overflow in their current implementation.~~

ugh I miss absl::Time that has saturating logic instead...

rockwotj · 2025-08-29T16:12:59Z

src/v/kafka/server/handlers/produce.cc

+    auto max_timepoint = model::duration_since_epoch(header.max_timestamp);
+    if (
+      broker_timepoint < max_timepoint
+      && std::chrono::duration_cast<std::chrono::milliseconds>(
+           max_timepoint - broker_timepoint)
+           > message_timestamp_after_max_ms) {


ditto, why can't this be:

auto max_timepoint = broker_timepoint + message_timestamp_after_max_ms; if (header_max_timestamp > max_timepoint) { // error } ``

Potential overflow, better to subtract.

rockwotj

LGTM

rockwotj · 2025-08-29T16:52:14Z

Also please mention the property deprecation in the release notes

WillemKauf · 2025-08-29T16:55:00Z

Also please mention the property deprecation in the release notes

Done, i'll wait for a docs review of the cluster property description as well before merging

src/v/config/configuration.cc

…nfiguration`

…g handlers

…time Some clients (looking at you Sarama!) don't set a max_timestamp for batches when being produced. In Apache Kafka, all incoming batches have the max_timestamp batch header field set BY THE BROKER. Code references: Here for uncompressed batches: https://github.com/apache/kafka/blob/e124d3975bdb3a9ec85eee2fba7a1b0a6967d3a6/storage/src/main/java/org/apache/kafka/storage/internals/log/LogValidator.java#L275 Here for compressed batches: https://github.com/apache/kafka/blob/e124d3975bdb3a9ec85eee2fba7a1b0a6967d3a6/storage/src/main/java/org/apache/kafka/storage/internals/log/LogValidator.java#L404 When the timestamp_type is log_append then we do set the max_timestamp in the broker: https://github.com/redpanda-data/redpanda/blob/be699729fbbb0b48cc684d4417383ad1320103b7/src/v/kafka/server/handlers/produce.cc#L309 But don't do anything in the create_time case. This has the effect of breaking timequeries, which only look at the max_timestamp in many places of the storage layer (which is similar to the Kafka behavior, but they have the max_timestamp set correctly). However historically we've not decompressed batches on the produce path (which is technically a validation bug). Since that has many perf impliciations and doing that decompressing would presumably need at least a cluster config flag, we only partially fix this bug when the batch is uncompressed. For compressed batches we should really just do the right thing and validate the batch (but maybe behind a flag?).

We have topic properties that will actively reject messages now, so these alerts are un-needed and would only serve to confuse users due to a mismatched configuration between the alert and the other properties which reject the records.

…ax_ms` This commit properly enforces these configs within the produce path. There is also a subtle reworking of `validate_batch_timestamps()`, though there are no other functional changes in the reworking.

A variety of test cases that produce records with timestamps in the past, future, and present, to assert that `message.timestamp.{before/after}.max.ms` are being correctly enforced.

…s()` Kafka records can be produced with timestamps up to `int64_t::max()`. Before, validation checks for timestamps in the produce path were implemented via the following: ``` auto broker_time = model::timestamp::now(); auto broker_timepoint = model::duration_since_epoch(broker_time); auto max_timepoint = model::duration_since_epoch(header.max_timestamp); if (broker_timepoint < max_timepoint && std::chrono::duration_cast<std::chrono::milliseconds>( max_timepoint - broker_timepoint) > message_timestamp_after_max_ms) { ... } ``` where `model::duration_since_epoch()` is defined as: ``` inline timestamp_clock::duration duration_since_epoch(timestamp ts) { return std::chrono::duration_cast<timestamp_clock::duration>( std::chrono::milliseconds{ts.value()}); } ``` where `timestamp_clock::duration` is `std::chrono::system_clock::duration`, which can be `microseconds` or `nanoseconds`. Uh oh. So, in the case that the `max_timepoint` is near `int64_t::max()`, we effectively `duration_cast` a `std::chrono::milliseconds` type to a `std::chrono::microseconds` type (multiplying underlying value by 1000), subtract two values, and then attempt to cast back to a `std::chrono::milliseconds` type, which is only guaranteed to be a signed integer of at least 45 bits [1]. This can lead to overflow. Fix the issue by keeping everything as a `model::timestamp`, with `int64_t` comparisions and no opportunity for narrowing conversions. [1]: https://en.cppreference.com/w/cpp/chrono/duration.html

WillemKauf · 2025-08-29T21:26:23Z

Force push to:

Fix cluster property description per docs review
Add a new commit that better handles under/overflow in validate_batch_timestamps()

@rockwotj please take a look, I can squash the commit in with the original change if you'd like

WillemKauf · 2025-08-29T21:28:14Z

src/v/kafka/server/handlers/produce.cc

-           > message_timestamp_before_max_ms) {
+      broker_time > header.first_timestamp
+      && (broker_time - header.first_timestamp)
+           > model::timestamp(message_timestamp_before_max_ms.count())) {


this is now an int64_t comparison. Using .count() is fine since model::timestamp uses int64_t and expects milliseconds.

duration_since_epoch() is a pretty dangerous function, I think.

Agreed, I am actually going to remove it after this PR goes in.

rockwotj

LGTM

rockwotj · 2025-09-02T16:54:09Z

src/v/kafka/server/handlers/produce.cc

Any tests we should add for 99c7a2b?

I'll think about adding some lower level kafka tests here, it would be easier to ensure we aren't getting overflows in a fixture test than in Ducktape. I'll look at a quick follow-up here.

rockwotj

LGTM

WillemKauf requested a review from a team as a code owner August 29, 2025 00:02

github-actions bot added the area/redpanda label Aug 29, 2025

WillemKauf force-pushed the kafka_time_properties branch 3 times, most recently from 259c469 to c3bd8b5 Compare August 29, 2025 00:07

rockwotj self-requested a review August 29, 2025 00:11

WillemKauf force-pushed the kafka_time_properties branch 2 times, most recently from 78e73a3 to d38c385 Compare August 29, 2025 02:34

rockwotj reviewed Aug 29, 2025

View reviewed changes

WillemKauf force-pushed the kafka_time_properties branch 2 times, most recently from 9616654 to ec487f6 Compare August 29, 2025 16:20

WillemKauf requested a review from rockwotj August 29, 2025 16:34

rockwotj previously approved these changes Aug 29, 2025

View reviewed changes

paulohtb6 reviewed Aug 29, 2025

View reviewed changes

src/v/config/configuration.cc Show resolved Hide resolved

paulohtb6 reviewed Aug 29, 2025

View reviewed changes

src/v/config/configuration.cc Show resolved Hide resolved

WillemKauf and others added 8 commits August 29, 2025 16:08

config: add message_timestamp_{before/after}_max_ms

a69ca36

cluster: add message_timestamp_{before/after}_max_ms to `topic_co…

9909011

…nfiguration`

kafka: add message_timestamp_{before/after}_max_ms to topic confi…

98fc4e2

…g handlers

kafka: reject messages based on `message_timestamp_{before/after}_m…

8d050ec

…ax_ms` This commit properly enforces these configs within the produce path. There is also a subtle reworking of `validate_batch_timestamps()`, though there are no other functional changes in the reworking.

rptest: add timestamp_policy_test

ee23baf

A variety of test cases that produce records with timestamps in the past, future, and present, to assert that `message.timestamp.{before/after}.max.ms` are being correctly enforced.

WillemKauf dismissed rockwotj’s stale review via 99c7a2b August 29, 2025 21:25

WillemKauf force-pushed the kafka_time_properties branch from ec487f6 to 99c7a2b Compare August 29, 2025 21:25

WillemKauf requested a review from rockwotj August 29, 2025 21:26

WillemKauf commented Aug 29, 2025

View reviewed changes

rockwotj reviewed Sep 2, 2025

View reviewed changes

rockwotj approved these changes Sep 2, 2025

View reviewed changes

WillemKauf merged commit 55adcf1 into redpanda-data:dev Sep 2, 2025
18 checks passed

WillemKauf mentioned this pull request Sep 3, 2025

storage: use max_ts for retention_ms #27383

Merged

7 tasks

WillemKauf mentioned this pull request Sep 16, 2025

[v25.2.x] kafka: add kafka_produce_batch_validation and rework timestamp validation (MANUAL BACKPORT) #27599

Merged

7 tasks

Conversation

WillemKauf commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Features

Uh oh!

vbotbuildovich commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Retry command for Build#71583

Uh oh!

vbotbuildovich commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CI test results

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillemKauf Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

rockwotj commented Aug 29, 2025

Uh oh!

WillemKauf commented Aug 29, 2025

Uh oh!

Uh oh!

Uh oh!

WillemKauf commented Aug 29, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

WillemKauf Sep 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rockwotj left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

WillemKauf commented Aug 29, 2025 •

edited

Loading

vbotbuildovich commented Aug 29, 2025 •

edited

Loading

vbotbuildovich commented Aug 29, 2025 •

edited

Loading

WillemKauf Aug 29, 2025 •

edited

Loading

WillemKauf Sep 2, 2025 •

edited

Loading