Append to Metadata and merge upon deserialization#602
Conversation
|
I'm not the right person to review this - I haven't worked on any of this code. I also still don't like this design, since it's unnecessarily reliant on implementation details of the platform. |
I added you since I thought you have opinions on the implementation strategy. My thinking is that it's simpler to implement, most of this needs to be implemented for the other strategy anyways, we want a working solution sooner, and it's not likely to break at all for any of our current use cases (and if does so, it'll panic, not silently corrupt things). By the time we need something more robust, we may already have moved on from the current implementation. That being said, I can work on the other implementation next, on top of this. |
Why is this urgent; is this blocking someone? Generally, we want to avoid temporary solutions because there is a risk that they become permanent. How much time do you think you'll need to address Stuart's concerns?
I don't like that we're prioritizing ease of implementation over robustness. Even if the overall lifting feature is not supported on macOS, being platform agnostic is preferable IMHO. |
About 90% of this solution is also necessary for the separate files one, so to me it makes sense to merge this one first, and then work on the other solution as an improvement. This way we keep the PRs smaller and more self-contained, and also get a working solution merged so that There are also tons of other places where we prioritize a simpler, more narrow solution first to get things in working order before working on a better, more general solution. |
|
Yes, this adds new portability questions that we previously didn't have to worry about. The Linux atomic-write limit is high enough for our purposes, but the fact that there's a limit at all suggests that arbitrary limits are allowed by the spec. Now we have to wonder: what does the limit look like on macOS, Windows, or FreeBSD? Are there certain kernel configurations that would reduce the limit? Can different filesystems have different limits? And so on. The approach using separate metadata files avoids all these questions, and it doesn't sound like it will be too much more complex (though I admit I don't understand all the requirements here). |
|
I'm going to implement that more robust solution; I just want to merge this first. Almost all of this code is necessary for the other solution, too, and it fixes things in the meantime well-enough that we can test |
|
Could we use file locking to synchronize the updates to the metadata file? I found the |
We could try that, too. A file-locking solution would also re-use 90% of the changes in this PR, so I still think we should merge this one first (and a good reason to, too, as it's a common base from which we can implement either separate files that are merged or file locking). |
|
I'm going to extract the majority of this PR that deals with reading multiple |
Fixes #586.
Atomically append to the
Metadatafile (error if not written all at once) and then merge upon deserialization.See #586 (comment) for a discussion of solutions, of which this is one.
For each
rustcwrapper call, weopenin append mode, serialize to aVec<u8>, and then do a singlewrite(notwrite_all). If thewritedoesn't write all of its data, error, as only singlewrites are atomic*. This will leave consecutiveMetadatas in the file, so when reading, we need to keep deserializingMetadatas until we reach the end of the file, and then merge theMetadatas into one.* This should be fine unless there is a signal that interrupts the
write(or are there other cases the fullwritewouldn't go through?), or if the write is > 2 GB:from write(2):
This was easier to implement than the other option (write to separate files and then read them in, merge, and write the final one), but it may be more brittle if
writes don't write all at once (writeis atomic on almost all platforms and filesystems (not old NFS versions)). It currently works to instrument and runlighttpd.