It's wiggle time!#1202
Conversation
* Bump version to 0.48.0 * Re-enable `byteorder`'s default features. The code uses `WriteBytesExt` which depends on the `std` feature being enabled. So for now, just enable `std`.
Subscribe to Label ActionThis issue or pull request has been labeled: "w", "a", "s", "i" To subscribe or unsubscribe from this label, edit the |
1963b36 to
52a9467
Compare
| self.0 | ||
| } | ||
| } | ||
|
|
There was a problem hiding this comment.
can you just use the Into<u32> implementation?
There was a problem hiding this comment.
I agree it'd be good to have one way to do this, rather than two. Could I suggest preferring .inner() over .into()? Most of the code in wasi-common should treat these as opaque, so it'd be nice for the places where we do need to peek inside to stand out.
There was a problem hiding this comment.
Oh, this must be a remnant of my hack to not stifle progress with wasi-common while some required changes to wiggle were in progress. This change is no longer part of this PR, so please ignore it.
|
OK, I'm marking this as ready for review. Here's the summary of changes: |
sunfishcode
left a comment
There was a problem hiding this comment.
Very exciting! Here's a first round of comments:
| fd_pool: FdPool, | ||
| entries: HashMap<wasi::__wasi_fd_t, Entry>, | ||
| fd_pool: RefCell<FdPool>, | ||
| entries: RefCell<HashMap<types::Fd, Entry>>, |
There was a problem hiding this comment.
fd_pool and entries need to stay in sync with each other. How awkward would it be to have a single RefCell that holds a struct containing both?
There was a problem hiding this comment.
Ah, excellent observation!
There was a problem hiding this comment.
Ok, I've now added an aux struct called EntryTable which holds both fd_pool and entries, and inside WasiCtx it's behind a single RefCell thus both working in sync.
| return Err(Errno::Badf); | ||
| } | ||
| Ok(Ref::map(self.entries.borrow(), |entries| { | ||
| entries.get(&fd).unwrap() |
There was a problem hiding this comment.
This isn't strictly incorrect (at least while we only have one thread), but it feels awkward. Would it work to do something like this? A little verbose still, but it only does one borrow and one entries lookup.
let maybe = Ref::map(self.entries.borrow(), |entries| {
entries.get(&fd)
});
if maybe.is_none() {
return Err(Errno::BadF);
}
Ref::map(maybe, |maybe_entry| maybe_entry.unwrap())
}?
There was a problem hiding this comment.
Awesome, this is exactly what I wanted to do and couldn't figure out how. Thanks! :-D
There was a problem hiding this comment.
Ok, this proves to be more difficult than I originally envisioned. If I follow your suggestion, than we end up with a lifetime mismatch since the first maybe will contain weirdness of the like &Option<&Entry>, and the compiler cannot assign approriate lifetimes to the final Ref<'_, Entry>. Perhaps when we revisit the use of RefCell in wasi-common altogether, this will no longer be an issue?
There was a problem hiding this comment.
Ok, if it gets awkward, then I agree it makes sense to leave the code as-is for now.
| } | ||
| Ok(RefMut::map(self.entries.borrow_mut(), |entries| { | ||
| entries.get_mut(&fd).unwrap() | ||
| })) |
There was a problem hiding this comment.
And similar here? If so, we may not even need the contains_key method, which would be nice because it can be tricky to use without look-before-you-leap hazards.
| pub fn create_file<P: AsRef<Path>>(&mut self, path: P) -> io::Result<File> { | ||
| let path = path.as_ref(); | ||
| let mut fd = 0; | ||
| let mut fd = types::Fd::from(0); |
There was a problem hiding this comment.
Instead of from, what would you think of naming this method from_raw, to emphasize that this call is not something one should do often.
There was a problem hiding this comment.
Right, I've actually tweaked this only so that it builds. I was going to cc you about the fs module and how we'd want to handle things here :-)
There was a problem hiding this comment.
@pchickey What do you think of having a from_raw method for handles in wiggle?
| mut argv: GuestPtr<'b, GuestPtr<'b, u8>>, | ||
| mut argv_buf: GuestPtr<'b, u8>, | ||
| ) -> Result<()> { | ||
| trace!("args_get(argv_ptr={:?}, argv_buf={:?})", argv, argv_buf); |
There was a problem hiding this comment.
Is there a path to where these trace! calls could be auto-generated by wiggle?
There was a problem hiding this comment.
Excellent question! Thanks for bringing it up! @pchickey and I had some thoughts about this, and as far as I can tell, he would rather avoid adding logging to wiggle (which is where we'd want this to end up actually). I haven't put that much thought to it just yet, hence why I left the traces in wasi-common, but I'm open to suggestions here!
There was a problem hiding this comment.
Could there be a log feature of wiggle which generates log statements? That way wasmtime could turn it on but if it's not needed in lucet it could be disabled?
There was a problem hiding this comment.
Oh, I believe what @alexcrichton suggests sounds like the solution to this problem. @pchickey @sunfishcode what do you guys reckon? Oh, and until that lands in wiggle, my suggestion would be to leave the traces as-is in wasi-common. What do you think?
There was a problem hiding this comment.
Oh, and I think we should do the same for wasi-common as well. We've got a couple log calls here and there for debugging and what not, and I believe that @pchickey would like to have a version stripped off all altogether for lucet/xqd.
There was a problem hiding this comment.
Not suggesting we do it immediately in this PR, but we should have it in mind. :-)
| use std::ops::DerefMut; | ||
| use wiggle_runtime::{GuestBorrows, GuestPtr}; | ||
|
|
||
| impl<'a> WasiSnapshotPreview1 for WasiCtx { |
There was a problem hiding this comment.
Implementing this as a trait for WasiCtx is an interesting idea, and it is pretty fun to do ctx.fd_close as the code does above, but does this make it harder to share a WasiCtx between snapshot versions in the same wasm module?
There was a problem hiding this comment.
Excellent question! So I think it won't be that difficult. However, I suggest to leave that for a subsequent PR. I'm purposely migrated all impls into one module and directly into the trait impl so that we can have a better idea how to tackle the problem of multiple snapshots/polyfill. Does it make sense?
There was a problem hiding this comment.
I'd actually imagine that it makes it easier to share a WasiCtx between versions because we could implement both traits for one struct!
| ); | ||
|
|
||
| // Extract path as &str. | ||
| let mut bc = GuestBorrows::new(); |
There was a problem hiding this comment.
You got me, I was just following @alexcrichton's example in wiggle's tests :-} I'm happy to change it to something more meaningful, say borrows?
There was a problem hiding this comment.
its short for "borrow checker".
|
|
||
| // Extract path as &str. | ||
| let mut bc = GuestBorrows::new(); | ||
| let path = unsafe { &*path.as_raw(&mut bc)? }; |
There was a problem hiding this comment.
Can you say more about the unsafe here? I assume wiggle has done the bounds checking before we get here, is that right?
There was a problem hiding this comment.
That's correct, as_raw does the bounds checking. We need unsafe here since we're dereferencing a raw *mut _ pointer, and in this particular case, we're obtaining &str. The signature of this function here is:
fn as_raw(...) -> Result<*mut _, GuestError>This is unsafe since we cannot really track the references here other than by using GuestBorrows. More on that here.
|
@pchickey I've now re-used |
| // FIXME | ||
| // Much like in the case with `path_symlink`, we need to account for the | ||
| // fact where both `old_path` and `new_path` point the same memory location. | ||
| // Hence, separate scopes and `GuestBorrows` instances. | ||
| let resolved_old = { | ||
| let mut bc = GuestBorrows::new(); | ||
| let old_as_str = old_path.as_raw(&mut bc)?; | ||
| let old_path = unsafe { &*old_as_str }; | ||
|
|
||
| trace!(" | (old_path_ptr,old_path_len)='{}'", old_path); | ||
|
|
||
| let old_entry = unsafe { self.get_entry(old_fd)? }; | ||
| path::get( | ||
| &old_entry, | ||
| types::Rights::PATH_LINK_SOURCE, | ||
| types::Rights::empty(), | ||
| types::Lookupflags::empty(), | ||
| old_path, | ||
| false, | ||
| )? | ||
| }; | ||
| let resolved_new = { | ||
| let mut bc = GuestBorrows::new(); | ||
| let new_as_str = new_path.as_raw(&mut bc)?; | ||
| let new_path = unsafe { &*new_as_str }; | ||
|
|
||
| trace!(" | (new_path_ptr,new_path_len)='{}'", new_path); | ||
|
|
||
| let new_entry = unsafe { self.get_entry(new_fd)? }; | ||
| path::get( | ||
| &new_entry, | ||
| types::Rights::PATH_LINK_TARGET, | ||
| types::Rights::empty(), | ||
| types::Lookupflags::empty(), | ||
| new_path, | ||
| false, | ||
| )? | ||
| }; | ||
| path::link(resolved_old, resolved_new) |
There was a problem hiding this comment.
I'll try to beat everyone to the question here. I'm not sure how to handle this better with the current state of wiggle (which is not to say we can't modify wiggle to accommodate this case). So, I was forced to create two instances of GuestBorrows since it is possible for the two paths that are handed to us from guest to point at the same memory location (one use case being for instance symlink loops, etc.).
I was thinking, we could potentially check for that by comparing the pointers (if they implemented PartialEq), but it still feels clunky. Perhaps we should revisit the concept of immutability in wiggle? That is, differentiate immutable vs mutable borrows? In this case, I reckon it should be fine to borrow the same location immutably multiple times. Thoughts @sunfishcode @pchickey? @alexcrichton what do you reckon?
There was a problem hiding this comment.
This is a great question. One alternative to creating two different borrowed path strings is to clone two path strings from the wasm memory into Rust owned Strings. That can be done safely without any borrow checking, since only one clone can be happening at a time. I expect, since paths shouldn't be too big, that is the right design choice here, except for the possibility of a DoS attack by creating two 4gb Strings here.
There was a problem hiding this comment.
Differentiating between mutable and immutable borrows sounds like a good way to go.
For example, POSIX doesn't say anything about link's arguments aliasing or not, but most functions with output buffers, such as readlink are defined with restrict qualifiers, meaning the output buffer is expected to not overlap with any of the input buffers. This also echos the Rust's borrow checker: you can have many aliasing immutable references, but a mutable reference must be unaliased.
There was a problem hiding this comment.
I've thought about this and I'm fairly convinced the right thing, for now, is to keep this implementation and leave a nice comment that outlines the design decisions as you just did in this thread.
There was a problem hiding this comment.
I don't actually think that GuestBorrows is really needed here at all because there are no long-lived pointers, I think it's fine to do what this is currently doing, but I might recommend making it a bit more concise with the comment I made above
There was a problem hiding this comment.
Hmm, I don't think I understand what you mean here by GuestBorrows not needed here at all. I'm probably missing something but let me outline the things I do believe:
I agree that in this case, the borrow checking is vacuous. But there isn't presently another method besides GuestPtr::as_raw that allows us to get a &str out of this GuestPtr<str>. We could add a method to GuestPtr that returns an owned String, but I fear that is a DoS vector because the guest will control the size of that String, and therefore gets to allocate in Rust's heap. We haven't yet rigorously audited this codebase for DoS vectors, of course, but I'm wary of adding something I know I'll have to forbid actually using in production.
I do not think we should create a shortcut variant of GuestPtr::as_raw to omit borrow checking which is only safe if the user ensures borrows are always short-lived, because I could see that leading to bugs when someone less familiar with the safety properties updates the code. Maybe that is too paranoid?
There was a problem hiding this comment.
Exactly as @pchickey explained, obtaining a reference to a slice-type object in Rust from wiggle is currently only possible through as_raw method which requires a GuestBorrows object in scope.
There was a problem hiding this comment.
@pchickey er sorry I don't think I was being very clear! I don't think we should remove the GuestBorrows argument or use owned strings here, I mean that each of these is resolved independently and there's never any overlapping borrows so documenting fixmes and/or hacks isn't necessary because each path is getting sucked into an owned object anyway so borrows never last long enough for runtime borrow-checking to be that relevant.
| use wiggle_runtime::{GuestBorrows, GuestPtr}; | ||
|
|
||
| impl<'a> WasiSnapshotPreview1 for WasiCtx { | ||
| fn args_get<'b>( |
There was a problem hiding this comment.
@sunfishcode Note that we don't have unsafe's in methods' signatures. I was somewhat worried about this since we're still dealing with file descriptors and other OS handles, but perhaps the entry points to WASI syscalls are OK being safe especially given that WASI fd is now a handle type? What do you think?
There was a problem hiding this comment.
Crazy idea that I haven't thought all the way through yet: Would it work if we made Fd work like std::fs::File?
- Make
Fd::from_rawunsafe? , similar tofrom_raw_fdbeing unsafe. - Make
Fdnot implementCopy. ChangeFdarguments to&Fd. - Make
FdimplementDropand have it automatically close. This might be hard though. Is there a way in Rust to makedropprivate, so that you have to pass it back to the context?
There was a problem hiding this comment.
On further consideration, the Drop part makes this unworkable. I'm going to investigate other approaches.
There was a problem hiding this comment.
I like the idea but yeah, Drop will not work like this, unless we made the context object static/global. Having said that your suggestion aligns with my mental model of Fd 100%, so perhaps we could come up with something close to it but not necessarily one-to-one.
There was a problem hiding this comment.
Ok, thinking this through more carefully, maybe we can do this without the Drop part. So make handle types not implement Copy, pass handle arguments by reference rather than by value, and make the raw constructor unsafe. That's not the same as std::fs::File, but it still has some nice properties. We just need to be careful when calling the raw constructor, but there aren't many places where we need to do that outside of the generated wrappers. Does that sound feasible?
If you'd like to defer this to a later PR, that's fine too.
There was a problem hiding this comment.
Sounds good, and yeah, if we could defer to a subsequent PR, that would be perfect IMHO. There's already a substantial amount of tweaks I have to apply and I'm worried more and more will slip away. Is that OK?
alexcrichton
left a comment
There was a problem hiding this comment.
Looking great @kubkon!
My main interest here is to reduce the amount of unsafe as much as possible. I left a lot of comments along those lines, but let me know if anything stands out!
| // FIXME | ||
| // Much like in the case with `path_symlink`, we need to account for the | ||
| // fact where both `old_path` and `new_path` point the same memory location. | ||
| // Hence, separate scopes and `GuestBorrows` instances. | ||
| let resolved_old = { | ||
| let mut bc = GuestBorrows::new(); | ||
| let old_as_str = old_path.as_raw(&mut bc)?; | ||
| let old_path = unsafe { &*old_as_str }; | ||
|
|
||
| trace!(" | (old_path_ptr,old_path_len)='{}'", old_path); | ||
|
|
||
| let old_entry = unsafe { self.get_entry(old_fd)? }; | ||
| path::get( | ||
| &old_entry, | ||
| types::Rights::PATH_LINK_SOURCE, | ||
| types::Rights::empty(), | ||
| types::Lookupflags::empty(), | ||
| old_path, | ||
| false, | ||
| )? | ||
| }; | ||
| let resolved_new = { | ||
| let mut bc = GuestBorrows::new(); | ||
| let new_as_str = new_path.as_raw(&mut bc)?; | ||
| let new_path = unsafe { &*new_as_str }; | ||
|
|
||
| trace!(" | (new_path_ptr,new_path_len)='{}'", new_path); | ||
|
|
||
| let new_entry = unsafe { self.get_entry(new_fd)? }; | ||
| path::get( | ||
| &new_entry, | ||
| types::Rights::PATH_LINK_TARGET, | ||
| types::Rights::empty(), | ||
| types::Lookupflags::empty(), | ||
| new_path, | ||
| false, | ||
| )? | ||
| }; | ||
| path::link(resolved_old, resolved_new) |
There was a problem hiding this comment.
I don't actually think that GuestBorrows is really needed here at all because there are no long-lived pointers, I think it's fine to do what this is currently doing, but I might recommend making it a bit more concise with the comment I made above
|
Just a quick note -- there are a bunch of references to |
Thanks for the heads up! I have to admit, I've not paid too much attention to the docs yet. |
|
I'm curious what the story is for the old snapshot? It's already been somewhat painful when the two snapshots diverge in how they're defined idiomatically. Is the intention to migrate the old snapshot? Implement the old snapshot trait for the same |
So the current idea is to only migrate latest to |
sunfishcode
left a comment
There was a problem hiding this comment.
This is a big milestone, thanks for all your hard work here! This looks ready to land to me, and we can follow up as discussed in separate PRs.
| return Err(Errno::Badf); | ||
| } | ||
| Ok(Ref::map(self.entries.borrow(), |entries| { | ||
| entries.get(&fd).unwrap() |
There was a problem hiding this comment.
Ok, if it gets awkward, then I agree it makes sense to leave the code as-is for now.
This is a rather massive commit that introduces `wiggle` into the picture. We still use `wig`'s macro in `old` snapshot and to generate `wasmtime-wasi` glue, but everything else is now autogenerated by `wiggle`. In summary, thanks to `wiggle`, we no longer need to worry about serialising and deserialising to and from the guest memory, and all guest (WASI) types are now proper idiomatic Rust types. While we're here, in preparation for the ephemeral snapshot, I went ahead and reorganised the internal structure of the crate. Instead of modules like `hostcalls_impl` or `hostcalls_impl::fs`, the structure now resembles that in ephemeral with modules like `path`, `fd`, etc. Now, I'm not requiring we leave it like this, but I reckon it looks cleaner this way after all.
I've left the implementation of VirtualFs pretty much untouched as I don't feel that comfortable in changing the API too much. Having said that, I reckon `pread` and `pwrite` could be refactored out, and `preadv` and `pwritev` could be entirely rewritten using `seek` and `read_vectored` and `write_vectored`.
This commit adds aux struct `EntryTable` which is private to `WasiCtx` and is basically responsible for `Fd` alloc/dealloc as well as storing matching `Entry`s. This struct is entirely private to `WasiCtx` and as such as should remain transparent to `WasiCtx` users.
pchickey
left a comment
There was a problem hiding this comment.
Thank you @kubkon for all the outstanding work here and to @sunfishcode @alexcrichton for all of the excellent review feedback!
It's
wiggletime!This PR is the beginning of a journey which aims at replacing
wigcrate withwiggle. @pchickey and myself have been working onwigglefor over a month now, and we feel we got to the point where it could successfully be used inwasi-common,wasmtimeandlucet. We also agreed that if any functionality is still missing, we should iterate in-tree if possible. Anyhow, this is a draft PR so things will invariably change (a lot!), but I wanted to get it out there sooner rather than later so that I can get some meaningful feedback early on and work on addressing that when things are still plastic.