Append to `Metadata` and merge upon deserialization by kkysen · Pull Request #602 · immunant/c2rust

kkysen · 2022-08-10T22:20:07Z

Fixes #586.

Atomically append to the Metadata file (error if not written all at once) and then merge upon deserialization.

See #586 (comment) for a discussion of solutions, of which this is one.

For each rustc wrapper call, we open in append mode, serialize to a Vec<u8>, and then do a single write (not write_all). If the write doesn't write all of its data, error, as only single writes are atomic*. This will leave consecutive Metadatas in the file, so when reading, we need to keep deserializing Metadatas until we reach the end of the file, and then merge the Metadatas into one.

* This should be fine unless there is a signal that interrupts the write (or are there other cases the full write wouldn't go through?), or if the write is > 2 GB:
from write(2):

On Linux, write() (and similar system calls) will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred. (This is true on both 32-bit and 64-bit systems.)

This was easier to implement than the other option (write to separate files and then read them in, merge, and write the final one), but it may be more brittle if writes don't write all at once (write is atomic on almost all platforms and filesystems (not old NFS versions)). It currently works to instrument and run lighttpd.

…ization.

spernsteiner · 2022-08-10T23:41:29Z

I'm not the right person to review this - I haven't worked on any of this code. I also still don't like this design, since it's unnecessarily reliant on implementation details of the platform.

kkysen · 2022-08-11T21:20:56Z

I'm not the right person to review this - I haven't worked on any of this code. I also still don't like this design, since it's unnecessarily reliant on implementation details of the platform.

I added you since I thought you have opinions on the implementation strategy. My thinking is that it's simpler to implement, most of this needs to be implemented for the other strategy anyways, we want a working solution sooner, and it's not likely to break at all for any of our current use cases (and if does so, it'll panic, not silently corrupt things). By the time we need something more robust, we may already have moved on from the current implementation. That being said, I can work on the other implementation next, on top of this.

thedataking · 2022-08-11T21:31:19Z

My thinking is that it's simpler to implement, most of this needs to be implemented for the other strategy anyways, we want a working solution sooner, and it's not likely to break at all for any of our current use cases (and if does so, it'll panic, not silently corrupt things).

Why is this urgent; is this blocking someone? Generally, we want to avoid temporary solutions because there is a risk that they become permanent. How much time do you think you'll need to address Stuart's concerns?

This was easier to implement than the other option (write to separate files and then read them in, merge, and write the final one), but it may be more brittle

I don't like that we're prioritizing ease of implementation over robustness. Even if the overall lifting feature is not supported on macOS, being platform agnostic is preferable IMHO.

kkysen · 2022-08-11T22:39:10Z

Why is this urgent; is this blocking someone? Generally, we want to avoid temporary solutions because there is a risk that they become permanent. How much time do you think you'll need to address Stuart's concerns?

About 90% of this solution is also necessary for the separate files one, so to me it makes sense to merge this one first, and then work on the other solution as an improvement. This way we keep the PRs smaller and more self-contained, and also get a working solution merged so that lighttpd can be regression tested (we often want to test lighttpd while changing the instrumentation code to make sure things still work).

There are also tons of other places where we prioritize a simpler, more narrow solution first to get things in working order before working on a better, more general solution.

spernsteiner · 2022-08-11T22:41:04Z

Yes, this adds new portability questions that we previously didn't have to worry about. The Linux atomic-write limit is high enough for our purposes, but the fact that there's a limit at all suggests that arbitrary limits are allowed by the spec. Now we have to wonder: what does the limit look like on macOS, Windows, or FreeBSD? Are there certain kernel configurations that would reduce the limit? Can different filesystems have different limits? And so on.

The approach using separate metadata files avoids all these questions, and it doesn't sound like it will be too much more complex (though I admit I don't understand all the requirements here).

kkysen · 2022-08-11T22:56:40Z

I'm going to implement that more robust solution; I just want to merge this first. Almost all of this code is necessary for the other solution, too, and it fixes things in the meantime well-enough that we can test lighttpd on our machines.

ahomescu · 2022-08-11T22:57:54Z

Could we use file locking to synchronize the updates to the metadata file? I found the fs2 crate that supports this cross-platform (see the trait at https://docs.rs/fs2/latest/fs2/trait.FileExt.html).

kkysen · 2022-08-11T23:03:10Z

Could we use file locking to synchronize the updates to the metadata file? I found the fs2 crate that supports this cross-platform (see the trait at https://docs.rs/fs2/latest/fs2/trait.FileExt.html).

We could try that, too. A file-locking solution would also re-use 90% of the changes in this PR, so I still think we should merge this one first (and a good reason to, too, as it's a common base from which we can implement either separate files that are merged or file locking).

kkysen · 2022-08-12T00:16:06Z

I'm going to extract the majority of this PR that deals with reading multiple Metadatas from a file, as we'll need that for any of the solutions.

kkysen · 2022-08-12T01:21:19Z

Closing in favor of #604 (subset of this) and #605 (append with file locking).

Atomically append to the Metadata file and then merge upon deserial…

24bce6e

…ization.

kkysen requested review from fw-immunant, oinoom and spernsteiner August 10, 2022 22:20

kkysen mentioned this pull request Aug 12, 2022

Read and merge multiple Metadata #604

Merged

kkysen closed this Aug 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Append to `Metadata` and merge upon deserialization#602

Append to `Metadata` and merge upon deserialization#602
kkysen wants to merge 1 commit intomasterfrom
kkysen/append-metadata

kkysen commented Aug 10, 2022 •

edited

Loading

Uh oh!

spernsteiner commented Aug 10, 2022

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

thedataking commented Aug 11, 2022 •

edited

Loading

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

spernsteiner commented Aug 11, 2022

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

ahomescu commented Aug 11, 2022

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

kkysen commented Aug 12, 2022

Uh oh!

kkysen commented Aug 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

kkysen commented Aug 10, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

spernsteiner commented Aug 10, 2022

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

thedataking commented Aug 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

spernsteiner commented Aug 11, 2022

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

ahomescu commented Aug 11, 2022

Uh oh!

kkysen commented Aug 11, 2022

Uh oh!

kkysen commented Aug 12, 2022

Uh oh!

kkysen commented Aug 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kkysen commented Aug 10, 2022 •

edited

Loading

thedataking commented Aug 11, 2022 •

edited

Loading