Skip to content

feat: Implement MultipartUploadBackend on TieredStorage#458

Open
lcian wants to merge 9 commits into
mainfrom
lcian/feat/multipart-upload-tiered
Open

feat: Implement MultipartUploadBackend on TieredStorage#458
lcian wants to merge 9 commits into
mainfrom
lcian/feat/multipart-upload-tiered

Conversation

@lcian
Copy link
Copy Markdown
Member

@lcian lcian commented May 4, 2026

Last implementation of MultipartUploadBackend in objectstore-service.
Now TieredStorage requires a MultipartUploadBackend in its long_term slot.
All methods just pretty much forward to long_term, with the exception of complete_multipart which needs to use the Changelog for consistency and deal with the tombstones, with logic very similar to put_object except that we need an additional metadata read call.

Close FS-339

@lcian lcian requested a review from a team as a code owner May 4, 2026 15:36
@lcian lcian changed the title feat(gcs): implement MultipartUploadBackend using GCS XML API feat: Implement MultipartUploadBackend on TieredStorage May 4, 2026
@lcian lcian changed the base branch from main to lcian/feat/multipart-upload-other-backends May 4, 2026 15:37
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 87.91946% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 86.95%. Comparing base (24b35be) to head (c21fe73).

Files with missing lines Patch % Lines
objectstore-service/src/backend/testing.rs 0.00% 20 Missing ⚠️
objectstore-service/src/backend/mod.rs 0.00% 8 Missing ⚠️
objectstore-test/src/server.rs 0.00% 5 Missing ⚠️
objectstore-service/src/backend/tiered.rs 98.85% 3 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #458      +/-   ##
==========================================
+ Coverage   86.94%   86.95%   +0.01%     
==========================================
  Files          77       77              
  Lines       11394    11685     +291     
==========================================
+ Hits         9906    10161     +255     
- Misses       1488     1524      +36     
Components Coverage Δ
Rust Backend 91.73% <89.41%> (-0.08%) ⬇️
Rust Client 80.10% <ø> (ø)
Python Client 86.36% <ø> (ø)

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread objectstore-service/src/backend/tiered.rs Outdated
@lcian

This comment was marked as outdated.

@lcian lcian marked this pull request as draft May 4, 2026 15:50
@linear-code
Copy link
Copy Markdown

linear-code Bot commented May 5, 2026

Comment thread objectstore-service/src/backend/tiered.rs
fn try_into(self) -> Result<UploadId, Self::Error> {
use base64::Engine;
let json =
serde_json::to_vec(&self).map_err(|e| Error::serde("encoding multipart token", e))?;
Copy link
Copy Markdown
Member Author

@lcian lcian May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there's a smarter way to concatenate the 2 with less size overhead than JSON...
But any separator could theoretically also appear in the ID returned by the inner backend.

@lcian lcian marked this pull request as ready for review May 7, 2026 08:54
@lcian

This comment was marked as off-topic.

@lcian lcian requested a review from matt-codecov May 7, 2026 08:55
Comment thread objectstore-service/src/backend/tiered.rs
.await?
.ok_or_else(|| {
Error::generic("completed multipart object not found in long-term storage")
})?;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transient metadata read failure causes irrecoverable upload loss

Medium Severity

After the inner complete_multipart succeeds (consuming the multipart upload irreversibly), the guard is advanced to Written, and then get_metadata is called. If this call fails (transient network error or returns None), the ? propagates an error and the guard drops in Written phase. The cleanup logic then deletes the successfully assembled object from long-term storage. Unlike put_long_term — where the metadata is available as a parameter and no extra call exists between the LT write and CAS commit — here the client cannot retry complete_multipart because the upload has been consumed by the inner backend. They must re-upload all parts from scratch, which can be very costly for large uploads. Encoding the expiration_policy in the TieredUploadId at initiation time would eliminate this failure point entirely.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit 409be20. Configure here.

Copy link
Copy Markdown
Member Author

@lcian lcian May 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's very unlikely that this fails. I don't want to encode more information in the token.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option is to move the inner.long_term.get_metadata() call earlier in the function before we create our change guard. that way if it fails we can at least 5XX and the user can retry before we've made any persistent changes

looking at the change guard impl

async fn cleanup(self) {
let current = match self.phase {
// For `Recovered`, we must first check the state of the tombstone.
ChangePhase::Recovered => self.read_tombstone().await,
ChangePhase::Recorded => self.change.old.clone(),
// For `Written`, the CAS outcome is unknown — read HV to determine it.
ChangePhase::Written => self.read_tombstone().await,
ChangePhase::Lost => self.change.old.clone(),
ChangePhase::Updated => self.change.new.clone(),
ChangePhase::Completed => return, // unreachable
};
if current != self.change.old
&& let Some(ref old) = self.change.old
{
self.cleanup_lt(old).await;
}
if current != self.change.new
&& let Some(ref new) = self.change.new
{
self.cleanup_lt(new).await;
}
self.cleanup_log().await;
self.mark_completed();
}

thinking through the remaining possible errors and how the change guard would handle them:

  • if inner.long_term.complete_multipart() returns an error:
    • the guard will drop in the Recorded without cleaning anything up. the user can retry, unless the underlying backend is in an unretryable state
  • if inner.high_volume.compare_and_write() returns an error and didn't persist anything:
    • the guard will drop in the Written state. it will fetch the current tombstone ID, see that our new ID doesn't match, and delete the new ID from LT storage.
    • the user has to redo the whole multipart upload, but the system should be sane. we need to make sure the error communicates this
    • this is the same issue as what the bot describes, but this instance is unavoidable
  • if inner.high_volume.compare_and_write() returns an error but persisted the tombstone:
    • the guard will drop in the Written state. it will fetch the current tombstone ID, see that our new ID does match, and exit without cleaning anything up.

incidentally i see our change log doesn't yet have a durable backend implemented. i guess that's what FS-310 is still open for

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

incidentally i see our change log doesn't yet have a durable backend implemented. i guess that's what FS-310 is still open for

Exactly.

another option is to move the inner.long_term.get_metadata() call earlier in the function before we create our change guard. that way if it fails we can at least 5XX and the user can retry before we've made any persistent changes

We cannot call get_metadata yet because the object doesn't exist. S3 and GCS don't have APIs to retrieve the metadata for an incomplete multipart upload.
That's why we need to materialize the object first. And therefore, we need to create the guard beforehand, to avoid leaving that blob orphaned.

The alternative would be to include the expiration policy in the UploadId we give back to the user, but that introduces all kinds of conditions you have to handle, what if the user passes a different expiration policy at creation/completion? Also, we would need to validate the expiration policy in two places, etc. etc. -- in my opinion the first call should be the source of truth for this.

But you're right that this warrants some more thought.
With the current implementation, the client can basically never retry complete_multipart effectivly, because the dropped ChangeGuard will probably delete the new blob even if it completed successfully 🤔

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core issue is that ChangeGuard::Recorded assumes that every LT write uses a unique key, while in this case we would retry multiple times on the same key (revision).

Comment thread objectstore-service/src/backend/tiered.rs Outdated
Copy link
Copy Markdown
Contributor

@matt-codecov matt-codecov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stamp w comments

use tokio_util::task::task_tracker::TaskTrackerToken;

use crate::backend::common::{Backend, HighVolumeBackend, TieredMetadata};
use crate::backend::common::{HighVolumeBackend, MultipartUploadBackend, TieredMetadata};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is every long-term storage backend required to impl MultipartUploadBackend? should we rename it LongTermBackend trait to parallel the HighVolumeBackend and make sure both have a trait bound requiring that Backend be implemented too?

Copy link
Copy Markdown
Member Author

@lcian lcian May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes exactly, it's implemented as follows:

pub trait MultipartUploadBackend: Backend + fmt::Debug + Send + Sync + 'static {

So, it already has a trait bound on Backend.

I guess a more abstracted name for it could be LongTermBackend, yeah. Let me change it in a follow-up.

Comment thread objectstore-service/src/backend/mod.rs
.await?;

if error.is_some() {
return Ok(error);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is Ok appropriate here? is this an endpoint that returns 200 OK always and you have to read the response body to interpret errors?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's intended, that's also how S3/GCS implement this and I mentioned it in the design doc.
That's because the complete call can take a while (I guess it's assembling the object in the backend), so S3 takes exactly this approach of a 200 and error in the body, therefore the most general solution is for us to mirror it.
Otherwise, the client might very easily run into a timeout if we don't return a response/headers within X s.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But probably this warrants a comment explaining how it works.

.await?
.ok_or_else(|| {
Error::generic("completed multipart object not found in long-term storage")
})?;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another option is to move the inner.long_term.get_metadata() call earlier in the function before we create our change guard. that way if it fails we can at least 5XX and the user can retry before we've made any persistent changes

looking at the change guard impl

async fn cleanup(self) {
let current = match self.phase {
// For `Recovered`, we must first check the state of the tombstone.
ChangePhase::Recovered => self.read_tombstone().await,
ChangePhase::Recorded => self.change.old.clone(),
// For `Written`, the CAS outcome is unknown — read HV to determine it.
ChangePhase::Written => self.read_tombstone().await,
ChangePhase::Lost => self.change.old.clone(),
ChangePhase::Updated => self.change.new.clone(),
ChangePhase::Completed => return, // unreachable
};
if current != self.change.old
&& let Some(ref old) = self.change.old
{
self.cleanup_lt(old).await;
}
if current != self.change.new
&& let Some(ref new) = self.change.new
{
self.cleanup_lt(new).await;
}
self.cleanup_log().await;
self.mark_completed();
}

thinking through the remaining possible errors and how the change guard would handle them:

  • if inner.long_term.complete_multipart() returns an error:
    • the guard will drop in the Recorded without cleaning anything up. the user can retry, unless the underlying backend is in an unretryable state
  • if inner.high_volume.compare_and_write() returns an error and didn't persist anything:
    • the guard will drop in the Written state. it will fetch the current tombstone ID, see that our new ID doesn't match, and delete the new ID from LT storage.
    • the user has to redo the whole multipart upload, but the system should be sane. we need to make sure the error communicates this
    • this is the same issue as what the bot describes, but this instance is unavoidable
  • if inner.high_volume.compare_and_write() returns an error but persisted the tombstone:
    • the guard will drop in the Written state. it will fetch the current tombstone ID, see that our new ID does match, and exit without cleaning anything up.

incidentally i see our change log doesn't yet have a durable backend implemented. i guess that's what FS-310 is still open for

@lcian lcian force-pushed the lcian/feat/multipart-upload-other-backends branch from e8a193d to 803c658 Compare May 12, 2026 10:43
Base automatically changed from lcian/feat/multipart-upload-other-backends to main May 12, 2026 11:22
lcian added 8 commits May 12, 2026 13:30
Move ChangeGuard creation before get_metadata so that if the metadata
read fails after complete_multipart succeeds, the guard cleans up the
assembled blob. Previously, Manual-expiration objects could be orphaned
permanently. Also fix redundant rustdoc link.
@lcian lcian force-pushed the lcian/feat/multipart-upload-tiered branch from 409be20 to d6eb979 Compare May 12, 2026 11:32
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 2 total unresolved issues (including 1 from previous review).

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit d6eb979. Configure here.


if error.is_some() {
return Ok(error);
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cleanup guard can race with retries on fixed revision key

Medium Severity

The change guard is recorded before calling complete_multipart on the long-term backend. If completion fails (body error or transport error), the guard drops in Recorded phase and spawns a background cleanup that deletes the physical key via delete_object. Unlike put_long_term — where each call generates a fresh revision key via new_long_term_revision — here the physical key is fixed (baked into the TieredUploadId at initiation time). If a client retries complete_multipart with the same upload_id and the retry succeeds before the old cleanup task executes delete_object, the cleanup deletes the newly completed object, leaving the subsequently written HV tombstone pointing to a non-existent LT blob (data loss).

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit d6eb979. Configure here.

@lcian
Copy link
Copy Markdown
Member Author

lcian commented May 15, 2026

This is pending resolution of #458 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants