kafka: set max_timestamp if missing and using timestamp_type::create_time by rockwotj · Pull Request #25200 · redpanda-data/redpanda

rockwotj · 2025-02-27T03:24:25Z

Some clients (looking at you Sarama!) don't set a max_timestamp for
batches when being produced. In Apache Kafka, all incoming batches have
the max_timestamp batch header field set BY THE BROKER. Code references:

Here for uncompressed batches:
https://github.com/apache/kafka/blob/e124d3975bdb3a9ec85eee2fba7a1b0a6967d3a6/storage/src/main/java/org/apache/kafka/storage/internals/log/LogValidator.java#L275

Here for compressed batches:
https://github.com/apache/kafka/blob/e124d3975bdb3a9ec85eee2fba7a1b0a6967d3a6/storage/src/main/java/org/apache/kafka/storage/internals/log/LogValidator.java#L404

When the timestamp_type is log_append then we do set the max_timestamp
in the broker:

redpanda/src/v/kafka/server/handlers/produce.cc

Line 309 in be69972

batch->set_max_timestamp(

But don't do anything in the create_time case. This has the effect of
breaking timequeries, which only look at the max_timestamp in many
places of the storage layer (which is similar to the Kafka behavior, but
they have the max_timestamp set correctly).

However historically we've not decompressed batches on the produce path
(which is technically a validation bug). Since that has many perf
impliciations and doing that decompressing would presumably need at
least a cluster config flag, we only partially fix this bug when the
batch is uncompressed. For compressed batches we should really just
do the right thing and validate the batch (but maybe behind a flag?).

Note that we only set the max_timestamp if the client doesn't set it,
which deviates from Apache Kafka behavior. I could be convinced to always
set it, but then we'd do extra crc computations if there was a client bug.

Backports Required

Release Notes

Bug Fixes

Fixes timequeries in some cases when message.timestamp.type=CreateTime and the producer leaves the max_timestamp batch header as the missing timestamp (-1).

…time Some clients (looking at you Sarama!) don't set a max_timestamp for batches when being produced. In Apache Kafka, all incoming batches have the max_timestamp batch header field set BY THE BROKER. Code references: Here for uncompressed batches: https://github.com/apache/kafka/blob/e124d3975bdb3a9ec85eee2fba7a1b0a6967d3a6/storage/src/main/java/org/apache/kafka/storage/internals/log/LogValidator.java#L275 Here for compressed batches: https://github.com/apache/kafka/blob/e124d3975bdb3a9ec85eee2fba7a1b0a6967d3a6/storage/src/main/java/org/apache/kafka/storage/internals/log/LogValidator.java#L404 When the timestamp_type is log_append then we do set the max_timestamp in the broker: https://github.com/redpanda-data/redpanda/blob/be699729fbbb0b48cc684d4417383ad1320103b7/src/v/kafka/server/handlers/produce.cc#L309 But don't do anything in the create_time case. This has the effect of breaking timequeries, which only look at the max_timestamp in many places of the storage layer (which is similar to the Kafka behavior, but they have the max_timestamp set correctly). However historically we've not decompressed batches on the produce path (which is technically a validation bug). Since that has many perf impliciations and doing that decompressing would presumably need at least a cluster config flag, we only partially fix this bug when the batch is uncompressed. For compressed batches we should really just do the right thing and validate the batch (but maybe behind a flag?).

rockwotj · 2025-02-27T03:28:07Z

Will add a test and send out for review

rockwotj · 2025-03-12T19:03:09Z

Closing this, we fixed the client that was not setting max_timestamp, so this is lower priority for now.

…ion_ms` The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using _only_ the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, take the minimum of the record's timestamp as written and the current broker time. This will achieve the original goal of preventing future timestamps from blocking retention enforcement, while also avoiding any unexpected behavior with past record timestamps. We also need to deal with the case where a client may have left a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2]. Clamp any non-positive values for `max_ts` to `broker_ts` in this case. [1]: * redpanda-data#9820 * redpanda-data#12991 [2]: * redpanda-data#25200

github-actions bot added the area/redpanda label Feb 27, 2025

rockwotj closed this Mar 12, 2025

rockwotj mentioned this pull request Apr 30, 2025

storage: decompress batch during batch_timequery() #25932

Merged

8 tasks

WillemKauf mentioned this pull request Aug 26, 2025

storage: use max_ts for retention_ms #27383

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kafka: set max_timestamp if missing and using timestamp_type::create_time#25200

kafka: set max_timestamp if missing and using timestamp_type::create_time#25200
rockwotj wants to merge 1 commit intoredpanda-data:devfrom
rockwotj:timestampz

rockwotj commented Feb 27, 2025

Uh oh!

rockwotj commented Feb 27, 2025

Uh oh!

rockwotj commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rockwotj commented Feb 27, 2025

Backports Required

Release Notes

Bug Fixes

Uh oh!

rockwotj commented Feb 27, 2025

Uh oh!

rockwotj commented Mar 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant