Skip to content

capture version_id after upload#524

Open
shcheklein wants to merge 2 commits intofsspec:mainfrom
shcheklein:fix-capture-version-on-write
Open

capture version_id after upload#524
shcheklein wants to merge 2 commits intofsspec:mainfrom
shcheklein:fix-capture-version-on-write

Conversation

@shcheklein
Copy link
Contributor

Similar to fsspec/gcsfs#718

To be able to refer to a file properly (in case multiple uploads happening simultaneously to the same path), we want to grab generation from the response. S3 already does this.

@shcheklein
Copy link
Contributor Author

@martindurant do you know by chance who is maintaining the project and could release this change?

@kyleknap
Copy link
Collaborator

kyleknap commented Dec 11, 2025

@shcheklein thanks for contribution! Populating the updated version_id after a completed upload seems reasonable to me especially since s3fs is already doing this and you have a similar change out for gcsfs.

I'll have more cycles next week to take a deeper review but some initial things we probably want to address now are:

  1. Let's populate the version_id only if the upstream file system is version_aware. Looks like s3fs has similar logic
  2. It would be great if we can get test cases added to make sure we the version_id is getting set for the different upload branches where it is being set now for version aware filesystems. It would also be good to have a test case too where we don't set the version_id if the filesystem is not version aware.
  3. Add a small changelog entry here for the enhancement.

@shcheklein shcheklein force-pushed the fix-capture-version-on-write branch 2 times, most recently from 4eaa094 to 70c345d Compare December 21, 2025 01:40
@shcheklein
Copy link
Contributor Author

Hi @kyleknap , thanks for the feedback. I've addressed all the items.

Azurite doesn't support versioning - thus we use mocks in tests. I think that should be fine.

@kyleknap
Copy link
Collaborator

kyleknap commented Jan 5, 2026

Thanks @shcheklein! I agree mocks suffice in this case. I'll take a deeper look at the PR this week.

@kyleknap kyleknap self-assigned this Jan 5, 2026
Copy link
Collaborator

@kyleknap kyleknap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good! I pulled down the PR and tried it out and it was working well. I just had one question related to append blobs that I ran into.

adlfs/spec.py Outdated
Comment on lines +2285 to +2286
if self.fs.version_aware:
self.version_id = response.get("version_id")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For append blobs, were you able to get version id's returned for the upload blob method call? Trying it out myself, I'm not sure if an x-ms-version-id header is even returned back when a PUT is called on an append blob. So the version_id was consistently None even though for block blobs it was being returned back. That being said for HEAD calls on the append blob, the x-ms-version-id was present.

I'm thinking if the API is not returning version id's for PUTs on append blobs, let's remove this logic for now to avoid code paths that won't be hit and avoid having tests implying this should work. We can always add it in if version id's are returned in the future.

@kyleknap
Copy link
Collaborator

Hi @shcheklein. Have you gotten a chance to look at the comment that I left? If you do not have the bandwidth to look or make updates, we can make the remaining updates as well to get this merged too. Let us know.

@shcheklein
Copy link
Contributor Author

@kyleknap apologies for the delay. It is a good catch with append. I wanted actually to double check the workflow in that case - e.g. does it update the version? do we capture it at the end, etc. I'll get to it soon, but I'm also totally fine if someone can also check it to be sure. Also, fine to remove it completely and merge.

@shcheklein shcheklein force-pushed the fix-capture-version-on-write branch from 921d237 to 36ded67 Compare February 27, 2026 00:37
@shcheklein
Copy link
Contributor Author

@kyleknap PTAL again when you have a minute. Double checked the append workflow. Unfortunately it seems there is no reliable way to get the version ID of an object being updated (only if it was created first time). Any extra API call leads to a potential race. Also append it seems doesn't create new versions at all (?). I think we have to skip it pretty much for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants