Add `GcRuntime` and `GcCompiler` traits; `i31ref` support by fitzgen · Pull Request #8196 · bytecodealliance/wasmtime

fitzgen · 2024-03-20T20:57:34Z

This is still a WIP. I think architecture and everything is there, although there are certain things to improve upon still, like the pooling allocator integration and the question of who allocates the memory used in a GC heap, but I think that can happen in follow up PRs. The big thing is that there are still some tests failing, a bunch new tests that need to be written, and at least one blocker (#8180) to be fixed. Also I need to rebase, which will be fun given all the churn related to tables recently. However, this is big enough that I think those things can happen in parallel with review on the main bits and the architecture and all that.

The `GcRuntime` and `GcCompiler` Traits

This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of those proposed in the Wasm GC RFC, although not all equivalent functionality has been added yet because Wasmtime doesn't support, for example, GC structs yet:

The GcRuntime trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires.

Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is much less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don't believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like type TheOneGcImpl = ...; based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object.

The GcRuntime trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I'd like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator's memory pool be either a linear memory or a GC heap. This would unask various capacity planning questions such as "what percent of memory capacity should we dedicate to linear memories vs GC heaps?". It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the "Indexed Heaps" section below).
The GcCompiler trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The GcCompiler implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths.

All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not purely code motion (see "Indexed Heaps" below) so it is worth not simply skipping over the DRC collector's code in review.

Indexed Heaps

This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared VMGcHeader that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use "indexed heaps".

What we are calling indexed heaps are basically the three following invariants:

All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via malloc.
A pointer to a GC-managed object (i.e. a VMGcRef) is a 32-bit offset into the GC heap's contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref'ing GC pointers is equivalent to deref'ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations¹² so we are in good company.
Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an (array (ref any)) is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an externref's host object inside the GC heap. Instead an externref now stores an ID that can be used to index into a side table in the store that holds the actual Box<dyn Any> host object, and accessing that side table is always checked.

The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of VMGcHeader alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an i31ref discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.)

The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data.

Taken altogether, this allows for collector implementations that are nearly free from unsafe code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can't be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations.

`VMGcRef` is NOT `Clone` or `Copy` Anymore

VMGcRef used to be Clone and Copy. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn't be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that's okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling unchecked_copy. The default now is that you can't just copy the reference, and instead call an explicit clone method (not the Clone trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before.

`i31ref`

I started this commit by trying to add i31ref support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for i31refs, so I figured that I might as well add the whole traits interface. In comparison, i31ref support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that!

Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier!

See "Compressed OOPs" in
OpenJDK. ↩
See V8's pointer
compression. ↩

github-actions · 2024-03-20T21:44:23Z

Subscribe to Label Action

cc @fitzgen

Details

This issue or pull request has been labeled: "cranelift", "cranelift:wasm", "fuzzing"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

dicej

Looks great! I'm excited to see this move forward.

Please see a few inline comments and suggestions. The only one that might be a blocker is the cast from a pointer to a 64-bit value to a pointer to a 32-bit value due to endianness concerns. Based on our earlier conversation, sounds like you're planning to get rid of that anyway.

github-actions · 2024-04-02T21:44:34Z

Subscribe to Label Action

cc @peterhuene

Details

This issue or pull request has been labeled: "wasmtime:c-api"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:c-api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

\### The `GcRuntime` and `GcCompiler` Traits This commit factors out the details of the garbage collector away from the rest of the runtime and the compiler. It does this by introducing two new traits, very similar to a subset of [those proposed in the Wasm GC RFC], although not all equivalent functionality has been added yet because Wasmtime doesn't support, for example, GC structs yet: [those proposed in the Wasm GC RFC]: https://github.com/bytecodealliance/rfcs/blob/main/accepted/wasm-gc.md#defining-the-pluggable-gc-interface 1. The `GcRuntime` trait: This trait defines how to create new GC heaps, run collections within them, and execute the various GC barriers the collector requires. Rather than monomorphize all of Wasmtime on this trait, we use it as a dynamic trait object. This does imply some virtual call overhead and missing some inlining (and resulting post-inlining) optimization opportunities. However, it is *much* less disruptive to the existing embedder API, results in a cleaner embedder API anyways, and we don't believe that VM runtime/embedder code is on the hot path for working with the GC at this time anyways (that would be the actual Wasm code, which has inlined GC barriers and direct calls and all of that). In the future, once we have optimized enough of the GC that such code is ever hot, we have options we can investigate at that time to avoid these dynamic virtual calls, like only enabling one single collector at build time and then creating a static type alias like `type TheOneGcImpl = ...;` based on the compile time configuration, and using this type alias in the runtime rather than a dynamic trait object. The `GcRuntime` trait additionally defines a method to reset a GC heap, for use by the pooling allocator. This allows reuse of GC heaps across different stores. This integration is very rudimentary at the moment, and is missing all kinds of configuration knobs that we should have before deploying Wasm GC in production. This commit is large enough as it is already! Ideally, in the future, I'd like to make it so that GC heaps receive their memory region, rather than allocate/reserve it themselves, and let each slot in the pooling allocator's memory pool be *either* a linear memory or a GC heap. This would unask various capacity planning questions such as "what percent of memory capacity should we dedicate to linear memories vs GC heaps?". It also seems like basically all the same configuration knobs we have for linear memories apply equally to GC heaps (see also the "Indexed Heaps" section below). 2. The `GcCompiler` trait: This trait defines how to emit CLIF that implements GC barriers for various operations on GC-managed references. The Rust code calls into this trait dynamically via a trait object, but since it is customizing the CLIF that is generated for Wasm code, the Wasm code itself is not making dynamic, indirect calls for GC barriers. The `GcCompiler` implementation can inline the parts of GC barrier that it believes should be inline, and leave out-of-line calls to rare slow paths. All that said, there is still only a single implementation of each of these traits: the existing deferred reference-counting (DRC) collector. So there is a bunch of code motion in this commit as the DRC collector was further isolated from the rest of the runtime and moved to its own submodule. That said, this was not *purely* code motion (see "Indexed Heaps" below) so it is worth not simply skipping over the DRC collector's code in review. \### Indexed Heaps This commit does bake in a couple assumptions that must be shared across all collector implementations, such as a shared `VMGcHeader` that all objects allocated within a GC heap must begin with, but the most notable and far-reaching of these assumptions is that all collectors will use "indexed heaps". What we are calling indexed heaps are basically the three following invariants: 1. All GC heaps will be a single contiguous region of memory, and all GC objects will be allocated within this region of memory. The collector may ask the system allocator for additional memory, e.g. to maintain its free lists, but GC objects themselves will never be allocated via `malloc`. 2. A pointer to a GC-managed object (i.e. a `VMGcRef`) is a 32-bit offset into the GC heap's contiguous region of memory. We never hold raw pointers to GC objects (although, of course, we have to compute them and use them temporarily when actually accessing objects). This means that deref'ing GC pointers is equivalent to deref'ing linear memory pointers: we need to add a base and we also check that the GC pointer/index is within the bounds of the GC heap. Furthermore, compressing 64-bit pointers into 32 bits is a fairly common technique among high-performance GC implementations[^compressed-oops][^v8-ptr-compression] so we are in good company. 3. Anything stored inside the GC heap is untrusted. Even each GC reference that is an element of an `(array (ref any))` is untrusted, and bounds checked on access. This means that, for example, we do not store the raw pointer to an `externref`'s host object inside the GC heap. Instead an `externref` now stores an ID that can be used to index into a side table in the store that holds the actual `Box<dyn Any>` host object, and accessing that side table is always checked. [^compressed-oops]: See ["Compressed OOPs" in OpenJDK.](https://wiki.openjdk.org/display/HotSpot/CompressedOops) [^v8-ptr-compression]: See [V8's pointer compression](https://v8.dev/blog/pointer-compression). The good news with regards to all the bounds checking that this scheme implies is that we can use all the same virtual memory tricks that linear memories use to omit explicit bounds checks. Additionally, (2) means that the sizes of GC objects is that much smaller (and therefore that much more cache friendly) because they are only holding onto 32-bit, rather than 64-bit, references to other GC objects. (We can, in the future, support GC heaps up to 16GiB in size without losing 32-bit GC pointers by taking advantage of `VMGcHeader` alignment and storing aligned indices rather than byte indices, while still leaving the bottom bit available for tagging as an `i31ref` discriminant. Should we ever need to support even larger GC heap capacities, we could go to full 64-bit references, but we would need explicit bounds checks.) The biggest benefit of indexed heaps is that, because we are (explicitly or implicitly) bounds checking GC heap accesses, and because we are not otherwise trusting any data from inside the GC heap, we greatly reduce how badly things can go wrong in the face of collector bugs and GC heap corruption. We are essentially sandboxing the GC heap region, the same way that linear memory is a sandbox. GC bugs could lead to the guest program accessing the wrong GC object, or getting garbage data from within the GC heap. But only garbage data from within the GC heap, never outside it. The worse that could happen would be if we decided not to zero out GC heaps between reuse across stores (which is a valid trade off to make, since zeroing a GC heap is a defense-in-depth technique similar to zeroing a Wasm stack and not semantically visible in the absence of GC bugs) and then a GC bug would allow the current Wasm guest to read old GC data from the old Wasm guest that previously used this GC heap. But again, it could never access host data. Taken altogether, this allows for collector implementations that are nearly free from `unsafe` code, and unsafety can otherwise be targeted and limited in scope, such as interactions with JIT code. Most importantly, we do not have to maintain critical invariants across the whole system -- invariants which can't be nicely encapsulated or abstracted -- to preserve memory safety. Such holistic invariants that refuse encapsulation are otherwise generally a huge safety problem with GC implementations. \### `VMGcRef` is *NOT* `Clone` or `Copy` Anymore `VMGcRef` used to be `Clone` and `Copy`. It is not anymore. The motivation here was to be sure that I was actually calling GC barriers at all the correct places. I couldn't be sure before. Now, you can still explicitly copy a raw GC reference without running GC barriers if you need to and understand why that's okay (aka you are implementing the collector), but that is something you have to opt into explicitly by calling `unchecked_copy`. The default now is that you can't just copy the reference, and instead call an explicit `clone` method (not *the* `Clone` trait, because we need to pass in the GC heap context to run the GC barriers) and it is hard to forget to do that accidentally. This resulted in a pretty big amount of churn, but I am wayyyyyy more confident that the correct GC barriers are called at the correct times now than I was before. \### `i31ref` I started this commit by trying to add `i31ref` support. And it grew into the whole traits interface because I found that I needed to abstract GC barriers into helpers anyways to avoid running them for `i31ref`s, so I figured that I might as well add the whole traits interface. In comparison, `i31ref` support is much easier and smaller than that other part! But it was also difficult to pull apart from this commit, sorry about that! --------------------- Overall, I know this is a very large commit. I am super happy to have some synchronous meetings to walk through this all, give an overview of the architecture, answer questions directly, etc... to make review easier! prtest:full

fitzgen requested review from alexcrichton and dicej March 20, 2024 20:57

fitzgen requested review from a team as code owners March 20, 2024 20:57

fitzgen requested review from cfallin and removed request for a team March 20, 2024 20:57

github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:wasm fuzzing Issues related to our fuzzing infrastructure labels Mar 20, 2024

fitzgen removed the request for review from cfallin March 20, 2024 21:05

fitzgen force-pushed the i31ref branch from 9daa275 to 3c055c5 Compare March 22, 2024 14:01

alexcrichton reviewed Mar 22, 2024

View reviewed changes

Comment thread crates/runtime/src/table.rs Outdated

fitzgen force-pushed the i31ref branch 3 times, most recently from d162c8c to b8337ee Compare March 22, 2024 21:58

dicej approved these changes Mar 25, 2024

View reviewed changes

alexcrichton approved these changes Mar 25, 2024

View reviewed changes

Comment thread crates/wasmtime/src/runtime/types/matching.rs

Comment thread crates/wasmtime/src/runtime/store.rs Outdated

Comment thread crates/wasmtime/src/runtime/store.rs Outdated

Comment thread crates/wasmtime/src/runtime/gc/enabled/externref.rs Outdated

fitzgen force-pushed the i31ref branch 9 times, most recently from c929664 to fdf1159 Compare April 2, 2024 13:42

fitzgen force-pushed the i31ref branch from fdf1159 to b6df243 Compare April 2, 2024 19:04

fitzgen enabled auto-merge April 2, 2024 19:05

fitzgen force-pushed the i31ref branch 4 times, most recently from 8f45761 to 44c7308 Compare April 2, 2024 21:15

fitzgen requested a review from a team as a code owner April 2, 2024 21:15

fitzgen added this pull request to the merge queue Apr 2, 2024

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 2, 2024

fitzgen force-pushed the i31ref branch from 44c7308 to 9b87906 Compare April 2, 2024 21:41

fitzgen enabled auto-merge April 2, 2024 21:41

github-actions Bot added the wasmtime:c-api Issues pertaining to the C API. label Apr 2, 2024

fitzgen added this pull request to the merge queue Apr 2, 2024

github-merge-queue Bot removed this pull request from the merge queue due to failed status checks Apr 2, 2024

fitzgen force-pushed the i31ref branch 2 times, most recently from 544066f to 80eb2a2 Compare April 3, 2024 18:56

fitzgen enabled auto-merge April 3, 2024 18:56

fitzgen force-pushed the i31ref branch 2 times, most recently from 5ce2532 to 9fdf5db Compare April 3, 2024 23:04

fitzgen force-pushed the i31ref branch from 9fdf5db to aec8fe3 Compare April 3, 2024 23:58

fitzgen added this pull request to the merge queue Apr 4, 2024

Merged via the queue into bytecodealliance:main with commit 0fa1301 Apr 4, 2024

fitzgen deleted the i31ref branch April 4, 2024 01:03

cfallin mentioned this pull request Aug 14, 2024

Disable the trace-log feature of regalloc2 by default #9128

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `GcRuntime` and `GcCompiler` traits; `i31ref` support#8196

Add `GcRuntime` and `GcCompiler` traits; `i31ref` support#8196
fitzgen merged 1 commit into
bytecodealliance:mainfrom
fitzgen:i31ref

fitzgen commented Mar 20, 2024

Uh oh!

github-actions Bot commented Mar 20, 2024

Uh oh!

Uh oh!

dicej left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 2, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fitzgen commented Mar 20, 2024

The GcRuntime and GcCompiler Traits

Indexed Heaps

VMGcRef is NOT Clone or Copy Anymore

i31ref

Footnotes

Uh oh!

github-actions Bot commented Mar 20, 2024

Subscribe to Label Action

Uh oh!

Uh oh!

dicej left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions Bot commented Apr 2, 2024

Subscribe to Label Action

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

The `GcRuntime` and `GcCompiler` Traits

`VMGcRef` is NOT `Clone` or `Copy` Anymore

`i31ref`