time based retention with broker time by andijcr · Pull Request #12991 · redpanda-data/redpanda

andijcr · 2023-08-24T14:26:00Z

This PR enables time-based retention to operate on broker time.

Previously, max_timestamps would be used for time-based retention.

When a new batch is indexed, the current timestamp is recorded in index_state and used when computing retention_ts of a segment.

For backward compatibility, the field is optional, and it's retained the old mechanism of storage_ignore_timestamos_in_future_sec + max_timestamp.

The broker_timestamp is always recorded for new segments, but its use is gated behind the feature broker_time_based_retention. Since this PR changes the behavior for some legitimate workloads that use timestamps in the past, this feature is on only for new clusters. For an upgraded cluster it needs to be activated via the admin API.

In some cases, when the index is lost and needs to be regenerated, the broker_timestamp is not set. A future PR will explore the impact of recording the broker timestamp in model::record_batch_header directly, to cover the case of re-indexing.

Fixes #12934 #12992

Unrelated to the main pr, but Fixes #13617

Backports Required

Release Notes

Features

Time-based retention uses the broker timestamps for new data. This reduces the risk of retention not removing segments when a misbehaving client produces messages with a timestamp in the future.
- The configuration option storage_ignore_timestamps_in_future_secs is retained to deal with bad segments produced before v23.3
- This changes the behavior for messages with a timestamp in the past. Before, retention would use this timestamp to delete data. now, the retention window starts when the message arrives in the broker.

andijcr · 2023-08-30T14:25:10Z

issue is #12839

andijcr · 2023-08-31T14:33:45Z

/ci-repeat 1

andijcr · 2023-08-31T17:12:06Z

/ci-repeat 1

tests/rptest/tests/retention_policy_test.py

src/v/storage/index_state.cc

src/v/storage/compaction_reducers.cc

andijcr · 2023-09-05T17:08:01Z

force push: comments reviews, fixed a missing broker_timestamp after self compaction

src/v/storage/index_state.cc

andrwng · 2023-09-07T17:32:14Z

src/v/storage/segment_index.h

+    }
+
+    /// this function is the codification of the table above
+    constexpr auto compute_retention_ms(


I'm curious how this can be constexpr if everything it takes as inputs are not known at compile time

a force of habit when I write a method in the header. In the first version, it was truly constexpr but then I wired in feature_table and configuration so now it's extremely unlikely that it will ever be constexpr. I'll change it.

tests/rptest/tests/retention_policy_test.py

src/v/storage/segment.cc

emaxerrno · 2023-09-08T21:21:45Z

src/v/storage/segment_index.h


 namespace storage {

+// clang-format off


this comment is amazing. should this be surfaced in docs? cc: @Feediver1

https://github.com/redpanda-data/documentation-private/issues/1977

Lazin · 2023-09-12T15:49:26Z

src/v/storage/segment_index.h

+// this struct is meant to be a local copy of the feature
+// broker_time_based_retention and configuration property
+// storage_ignore_timestamps_in_future_secs
+struct time_based_retention_cfg {


Do we need to follow the same logic in the cloud storage?

the last discussion result was that it was too high risk. It would require a new field in cstore and change the retention behaviour for cloud (so there should be some orchestration during an upgrade). applying storage_ignore_timestamos_in_future_sec would also require some non-zero effort.

Part of the followup is to document retention cloud vs local to have a discussion about this

Lazin

LGTM: curious about the cloud storage impact, also, that model::timestamp::now should probably be replaced with seastar::lowres_system_clock::now + conversion to model::timestamp.

andijcr · 2023-09-12T16:01:46Z

LGTM: curious about the cloud storage impact, also, that model::timestamp::now should probably be replaced with seastar::lowres_system_clock::now + conversion to model::timestamp.

adding a commit for this + static assert to ensure that the underlying datatype is stable

andijcr · 2023-09-15T09:49:41Z

force push to fix merge conflics.

since i had to do it, i added a last commit to use ss::lowres_system_clock
f8e6467

@Lazin
@StephanDollberg

…tests

to broker_timestamp. it's done by either introducing sleeps and keeping track of time, to predict what time based retention will do, or by checking that segment_index::broker_timestamp is preserved across compaction

a benign abandoned failed future, in caseof abort_requested_exeption

andijcr · 2023-09-28T13:47:14Z

last rebase was to fix the merge conflict for a spot in feature_table. ci is green

…ion_ms` The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using _only_ the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, take the minimum of the record's timestamp as written and the current broker time. This will achieve the original goal of preventing future timestamps from blocking retention enforcement, while also avoiding any unexpected behavior with past record timestamps. [1]: * redpanda-data#9820 * redpanda-data#12991

…ion_ms` The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using _only_ the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, take the minimum of the record's timestamp as written and the current broker time. This will achieve the original goal of preventing future timestamps from blocking retention enforcement, while also avoiding any unexpected behavior with past record timestamps. We also need to deal with the case where a client may have left a batch's `max_timestamp` unset, in which case it is marked with `{-1}` [2]. Clamp any non-positive values for `max_ts` to `broker_ts` in this case. [1]: * redpanda-data#9820 * redpanda-data#12991 [2]: * redpanda-data#25200

The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/protocol/kafka_batch_adapter.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991

The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991

The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. Of course, users can opt out of safety from `max_timestamp=-1` with `legacy` validation mode, in which case we may still want to use the broker time for retention enforcement rather than garbage collecting everything based on `max_timestamp=-1`. [1]: * redpanda-data#9820 * redpanda-data#12991

The motivating case for `broker_time_based_retention` was the fact that records with bad timestamps produced in the future could lead to time-based retention being stuck indefinitely [1]. However, using the `broker_ts` can lead to unexpected behavior when e.g. replicating data from an existing cluster using MM2, as the timestamps of the Kafka records themselves are correctly preserved, but internally, `redpanda` data structures are not. To avoid the potentially curious behavior of a divergence in retention enforcement, go back to using the `max_timestamp` for batches whose timestamps have been validated and unconditionally set in the produce path (see: `v/kafka/server/handlers/produce_validation.cc`) as of `v25.3.1`. [1]: * redpanda-data#9820 * redpanda-data#12991

github-actions bot added the area/redpanda label Aug 24, 2023

andijcr force-pushed the feat/index_state_broker_time branch 3 times, most recently from 15051ae to e1d4a27 Compare August 30, 2023 12:21

andijcr added area/storage doc-needed labels Aug 30, 2023

andijcr marked this pull request as ready for review August 30, 2023 13:07

VladLazar reviewed Sep 1, 2023

View reviewed changes

tests/rptest/tests/retention_policy_test.py Outdated Show resolved Hide resolved

src/v/storage/index_state.cc Outdated Show resolved Hide resolved

src/v/storage/compaction_reducers.cc Show resolved Hide resolved

andijcr force-pushed the feat/index_state_broker_time branch 2 times, most recently from 87f907b to 4850223 Compare September 5, 2023 16:50

andijcr requested review from Lazin, VladLazar, abhijat, andrwng and dotnwat September 5, 2023 17:08

andrwng reviewed Sep 7, 2023

View reviewed changes

emaxerrno reviewed Sep 8, 2023

View reviewed changes

andijcr requested a review from andrwng September 12, 2023 09:04

Lazin reviewed Sep 12, 2023

View reviewed changes

Lazin previously approved these changes Sep 12, 2023

View reviewed changes

piyushredpanda requested a review from StephanDollberg September 14, 2023 22:19

andijcr dismissed Lazin’s stale review via f8e6467 September 15, 2023 09:48

andijcr force-pushed the feat/index_state_broker_time branch from 4850223 to f8e6467 Compare September 15, 2023 09:48

andijcr requested a review from Lazin September 15, 2023 09:49

andijcr added 8 commits September 28, 2023 09:45

features/feature_table: activate also new_clusters_only features for …

742c718

…tests

storage/log_retention_tests: adapt it to broker_timestamp

1e29e57

storage/storage_e2e_test: update time based retention

94293d4

to broker_timestamp. it's done by either introducing sleeps and keeping track of time, to predict what time based retention will do, or by checking that segment_index::broker_timestamp is preserved across compaction

cluster/manual_log_deletion_test: adapt it to broker_timestamp

7c1c1c7

cloud_metadata/uploader_test: handle test releaded issue

f729852

a benign abandoned failed future, in caseof abort_requested_exeption

storage/disk_log_impl: const retention_cfg

a7c4497

storage/segment_utils: make_concatenated_segment completeness

8008ccf

storage/storage_e2e_test: test_time_based_eviction constant fix

e6443ee

andijcr force-pushed the feat/index_state_broker_time branch from 6e384a8 to e6443ee Compare September 28, 2023 10:30

VladLazar approved these changes Sep 28, 2023

View reviewed changes

andijcr merged commit 0a0a5d5 into redpanda-data:dev Sep 28, 2023

andijcr deleted the feat/index_state_broker_time branch September 28, 2023 15:37

andijcr mentioned this pull request Oct 2, 2023

Modify tests (unit or ducktape) to check time-based retention and timestamp skew alert with timestamp_type::append_time #13857

Closed

github-actions bot mentioned this pull request Dec 22, 2023

update redpanda appVersion from v23.2.21 to v23.3.1 redpanda-data/helm-charts#950

Merged

WillemKauf mentioned this pull request Aug 26, 2025

storage: use max_ts for retention_ms #27383

Merged

7 tasks

Conversation

andijcr commented Aug 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Backports Required

Release Notes

Features

Uh oh!

andijcr commented Aug 30, 2023

Uh oh!

andijcr commented Aug 31, 2023

Uh oh!

andijcr commented Aug 31, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andijcr commented Sep 5, 2023

Uh oh!

Uh oh!

andrwng Sep 7, 2023

Choose a reason for hiding this comment

Uh oh!

andijcr Sep 8, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

emaxerrno Sep 8, 2023

Choose a reason for hiding this comment

Uh oh!

Feediver1 Sep 11, 2023

Choose a reason for hiding this comment

Uh oh!

Lazin Sep 12, 2023

Choose a reason for hiding this comment

Uh oh!

andijcr Sep 12, 2023

Choose a reason for hiding this comment

Uh oh!

Lazin left a comment

Choose a reason for hiding this comment

Uh oh!

andijcr commented Sep 12, 2023

Uh oh!

andijcr commented Sep 15, 2023

Uh oh!

andijcr commented Sep 28, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

andijcr commented Aug 24, 2023 •

edited

Loading