time based retention with broker time#12991
Conversation
15051ae to
e1d4a27
Compare
|
issue is #12839 |
|
/ci-repeat 1 |
1 similar comment
|
/ci-repeat 1 |
87f907b to
4850223
Compare
|
force push: comments reviews, fixed a missing broker_timestamp after self compaction |
| } | ||
|
|
||
| /// this function is the codification of the table above | ||
| constexpr auto compute_retention_ms( |
There was a problem hiding this comment.
I'm curious how this can be constexpr if everything it takes as inputs are not known at compile time
There was a problem hiding this comment.
a force of habit when I write a method in the header. In the first version, it was truly constexpr but then I wired in feature_table and configuration so now it's extremely unlikely that it will ever be constexpr. I'll change it.
|
|
||
| namespace storage { | ||
|
|
||
| // clang-format off |
There was a problem hiding this comment.
this comment is amazing. should this be surfaced in docs? cc: @Feediver1
There was a problem hiding this comment.
| // this struct is meant to be a local copy of the feature | ||
| // broker_time_based_retention and configuration property | ||
| // storage_ignore_timestamps_in_future_secs | ||
| struct time_based_retention_cfg { |
There was a problem hiding this comment.
Do we need to follow the same logic in the cloud storage?
There was a problem hiding this comment.
the last discussion result was that it was too high risk. It would require a new field in cstore and change the retention behaviour for cloud (so there should be some orchestration during an upgrade). applying storage_ignore_timestamos_in_future_sec would also require some non-zero effort.
Part of the followup is to document retention cloud vs local to have a discussion about this
Lazin
left a comment
There was a problem hiding this comment.
LGTM: curious about the cloud storage impact, also, that model::timestamp::now should probably be replaced with seastar::lowres_system_clock::now + conversion to model::timestamp.
adding a commit for this + static assert to ensure that the underlying datatype is stable |
4850223 to
f8e6467
Compare
|
force push to fix merge conflics. since i had to do it, i added a last commit to use ss::lowres_system_clock |
to broker_timestamp. it's done by either introducing sleeps and keeping track of time, to predict what time based retention will do, or by checking that segment_index::broker_timestamp is preserved across compaction
a benign abandoned failed future, in caseof abort_requested_exeption
6e384a8 to
e6443ee
Compare
|
last rebase was to fix the merge conflict for a spot in feature_table. ci is green |
…ion_ms` The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using _only_ the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, take the minimum of the record's timestamp as written and the current broker time. This will achieve the original goal of preventing future timestamps from blocking retention enforcement, while also avoiding any unexpected behavior with past record timestamps. [1]: * redpanda-data#9820 * redpanda-data#12991
…ion_ms`
The motivating case for `broker_time_based_retention` was the
fact that records with bad timestamps produced in the future
could lead to time-based retention being stuck indefinitely [1].
However, using _only_ the `broker_ts` can lead to unexpected
behavior when e.g. replicating data from an existing cluster
using MM2, as the timestamps of the Kafka records themselves are
correctly preserved, but internally, `redpanda` data structures
are not.
To avoid the potentially curious behavior of a divergence in
retention enforcement, take the minimum of the record's timestamp
as written and the current broker time. This will achieve the
original goal of preventing future timestamps from blocking
retention enforcement, while also avoiding any unexpected
behavior with past record timestamps.
We also need to deal with the case where a client may have left
a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2].
Clamp any non-positive values for `max_ts` to `broker_ts` in this case.
[1]:
* redpanda-data#9820
* redpanda-data#12991
[2]:
* redpanda-data#25200
…ion_ms`
The motivating case for `broker_time_based_retention` was the
fact that records with bad timestamps produced in the future
could lead to time-based retention being stuck indefinitely [1].
However, using _only_ the `broker_ts` can lead to unexpected
behavior when e.g. replicating data from an existing cluster
using MM2, as the timestamps of the Kafka records themselves are
correctly preserved, but internally, `redpanda` data structures
are not.
To avoid the potentially curious behavior of a divergence in
retention enforcement, take the minimum of the record's timestamp
as written and the current broker time. This will achieve the
original goal of preventing future timestamps from blocking
retention enforcement, while also avoiding any unexpected
behavior with past record timestamps.
We also need to deal with the case where a client may have left
a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2].
Clamp any non-positive values for `max_ts` to `broker_ts` in this case.
[1]:
* redpanda-data#9820
* redpanda-data#12991
[2]:
* redpanda-data#25200
…ion_ms`
The motivating case for `broker_time_based_retention` was the
fact that records with bad timestamps produced in the future
could lead to time-based retention being stuck indefinitely [1].
However, using _only_ the `broker_ts` can lead to unexpected
behavior when e.g. replicating data from an existing cluster
using MM2, as the timestamps of the Kafka records themselves are
correctly preserved, but internally, `redpanda` data structures
are not.
To avoid the potentially curious behavior of a divergence in
retention enforcement, take the minimum of the record's timestamp
as written and the current broker time. This will achieve the
original goal of preventing future timestamps from blocking
retention enforcement, while also avoiding any unexpected
behavior with past record timestamps.
We also need to deal with the case where a client may have left
a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2].
Clamp any non-positive values for `max_ts` to `broker_ts` in this case.
[1]:
* redpanda-data#9820
* redpanda-data#12991
[2]:
* redpanda-data#25200
…ion_ms`
The motivating case for `broker_time_based_retention` was the
fact that records with bad timestamps produced in the future
could lead to time-based retention being stuck indefinitely [1].
However, using _only_ the `broker_ts` can lead to unexpected
behavior when e.g. replicating data from an existing cluster
using MM2, as the timestamps of the Kafka records themselves are
correctly preserved, but internally, `redpanda` data structures
are not.
To avoid the potentially curious behavior of a divergence in
retention enforcement, take the minimum of the record's timestamp
as written and the current broker time. This will achieve the
original goal of preventing future timestamps from blocking
retention enforcement, while also avoiding any unexpected
behavior with past record timestamps.
We also need to deal with the case where a client may have left
a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2].
Clamp any non-positive values for `max_ts` to `broker_ts` in this case.
[1]:
* redpanda-data#9820
* redpanda-data#12991
[2]:
* redpanda-data#25200
…ion_ms`
The motivating case for `broker_time_based_retention` was the
fact that records with bad timestamps produced in the future
could lead to time-based retention being stuck indefinitely [1].
However, using _only_ the `broker_ts` can lead to unexpected
behavior when e.g. replicating data from an existing cluster
using MM2, as the timestamps of the Kafka records themselves are
correctly preserved, but internally, `redpanda` data structures
are not.
To avoid the potentially curious behavior of a divergence in
retention enforcement, take the minimum of the record's timestamp
as written and the current broker time. This will achieve the
original goal of preventing future timestamps from blocking
retention enforcement, while also avoiding any unexpected
behavior with past record timestamps.
We also need to deal with the case where a client may have left
a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2].
Clamp any non-positive values for `max_ts` to `broker_ts` in this case.
[1]:
* redpanda-data#9820
* redpanda-data#12991
[2]:
* redpanda-data#25200
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/protocol/kafka_batch_adapter.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. Of course, users can opt out of safety from `max_timestamp=-1` with `legacy` validation mode, in which case we may still want to use the broker time for retention enforcement rather than garbage collecting everything based on `max_timestamp=-1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991
This PR enables time-based retention to operate on broker time.
Previously,
max_timestampswould be used for time-based retention.When a new batch is indexed, the current timestamp is recorded in
index_stateand used when computingretention_tsof a segment.For backward compatibility, the field is optional, and it's retained the old mechanism of
storage_ignore_timestamos_in_future_sec+max_timestamp.The
broker_timestampis always recorded for new segments, but its use is gated behind the featurebroker_time_based_retention. Since this PR changes the behavior for some legitimate workloads that use timestamps in the past, this feature is on only for new clusters. For an upgraded cluster it needs to be activated via the admin API.In some cases, when the index is lost and needs to be regenerated, the
broker_timestampis not set. A future PR will explore the impact of recording the brokertimestampinmodel::record_batch_headerdirectly, to cover the case of re-indexing.Fixes #12934 #12992
Unrelated to the main pr, but Fixes #13617
Backports Required
Release Notes
Features
storage_ignore_timestamps_in_future_secsis retained to deal with bad segments produced before v23.3