Skip to content

storage: optionally ignore timestamps in the future for retention#10028

Merged
jcsp merged 3 commits intoredpanda-data:devfrom
jcsp:issue-9820-timestamp-filter
Apr 19, 2023
Merged

storage: optionally ignore timestamps in the future for retention#10028
jcsp merged 3 commits intoredpanda-data:devfrom
jcsp:issue-9820-timestamp-filter

Conversation

@jcsp
Copy link
Copy Markdown
Contributor

@jcsp jcsp commented Apr 13, 2023

This is an off-by-default behavior to enable systems where a user has sent in a dramatically wrong timestamp to ask Redpanda to ignore timestamps in the future, and do its best to infer an alternative timestamp for use in retention.

Fixes #9820

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v23.1.x
  • v22.3.x
  • v22.2.x

Release Notes

Improvements

  • Added storage_ignore_timestamps_in_future_sec cluster configuration property (default null). If set to non-null, then timestamps more than this many seconds in the future will be ignored by Redpanda when considering whether a segment is old enough to garbage collect.

@jcsp jcsp added kind/enhance New feature or request area/storage labels Apr 13, 2023
@jcsp
Copy link
Copy Markdown
Contributor Author

jcsp commented Apr 13, 2023

@jcsp jcsp marked this pull request as ready for review April 13, 2023 13:34
@jcsp jcsp requested review from VladLazar and andrwng April 13, 2023 13:35
Copy link
Copy Markdown
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM overall, just a question about toggling the escape hatch

_retention_timestamp = t;
}
model::timestamp retention_timestamp() const {
return _retention_timestamp.value_or(_state.max_timestamp);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also gate this on storage_ignore_timestamps_in_future_sec, in case we switch back from ignoring timestamps and no longer want to rely on _retention_timestamp?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had mainly been thinking of this as a one shot thing, but you're right that it's straightforward to make it toggleable by adding a check here, so I've gone ahead and done that.

VladLazar
VladLazar previously approved these changes Apr 18, 2023
Copy link
Copy Markdown
Contributor

@VladLazar VladLazar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me. We should probably write down some instructions on how to detect scenarios where bogus timestamps are blocking retention and how to use this new config to resolve the situation.

Probably something along the lines:

  1. Look for "segment with bogus retention timestamps" logs
  2. From the log lines above, determine how far into the future those timestamps are
  3. Set storage_ignore_timestamps_in_future_sec to something that will allow retention to proceed
  4. Unset storage_ignore_timestamps_in_future_sec if the bogus timestamps was due to a transient client issue (perhaps the test should be extended to do this)

# remove anything because the timestamps are too far in the future.
with self.redpanda.monitor_log(self.redpanda.nodes[0]) as mon:
# Time period much larger than what we set log_compaction_interval_ms to
sleep(10)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this sleep actually necessary? The wait until below implies that retention has kicked in.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly not, but I wanted to have a better chance of the housekeeping having entirely completed, rather than proceeding as soon as it has seen one segment that emits the bogus timestamp message.

jcsp added 3 commits April 19, 2023 11:51
`segment_index` now has a retention_timestamp overlap
that overrides use of the index's max_timestamp as
the timestamp for retention purposes.

This override is set in disk_log_impl::retention_adjust_timestamps
if the storage_ignore_timestamps_in_future_sec configuration is
enabled and the segment's max timestamp is out of bounds.

There is no change to behavior by default: this only kicks in
if the new configuration property has been set.

Fixes redpanda-data#9820
@jcsp jcsp force-pushed the issue-9820-timestamp-filter branch from b5eb0ea to ba1b0ad Compare April 19, 2023 10:51
@jcsp jcsp merged commit 177cbe4 into redpanda-data:dev Apr 19, 2023
@jcsp jcsp deleted the issue-9820-timestamp-filter branch April 19, 2023 18:54
@jcsp
Copy link
Copy Markdown
Contributor Author

jcsp commented Apr 19, 2023

/backport v23.1.x

@jcsp
Copy link
Copy Markdown
Contributor Author

jcsp commented Apr 19, 2023

/backport v22.3.x

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 1f893a914033e8b5951e3a100d1f086f2203d155 a75c2208d3a9c11d966abe871a71685aeee80846 ba1b0ad59c72ea98d6e61dc3ead60c0bccdd12a1

Workflow run logs.

@vbotbuildovich
Copy link
Copy Markdown
Collaborator

Failed to run cherry-pick command. I executed the below command:

git cherry-pick -x 1f893a914033e8b5951e3a100d1f086f2203d155 a75c2208d3a9c11d966abe871a71685aeee80846 ba1b0ad59c72ea98d6e61dc3ead60c0bccdd12a1

Workflow run logs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

storage: escape hatch to ignore timestamps in future during retention

4 participants