VMContext: embed local Globals directly in the struct#6187
VMContext: embed local Globals directly in the struct#6187
Conversation
There was a problem hiding this comment.
Pull request overview
This PR changes the VMContext ABI so that local globals are stored inline as VMGlobalDefinition values inside VMContext, enabling faster access via the vmctx pointer and simplifying the pass-params-opt optimization to pass only a globals base pointer.
Changes:
- Store local globals inline in VMContext (adjust VMOffsets + instance/global plumbing accordingly).
- Update compiler backends (LLVM, Cranelift, Singlepass) and trampolines/ABI for the new globals access strategy and updated g0m0 parameter passing.
- Bump serialized metadata ABI version to reflect the VMContext layout change.
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/vm/src/vmcontext.rs | Updates offset/size test and removes obsolete TODO now that globals are inline. |
| lib/vm/src/instance/mod.rs | Switches local-global pointer math to inline VMGlobalDefinition storage; updates trampoline comment. |
| lib/vm/src/instance/allocator.rs | Exposes VMContext locations for inline global definitions during instantiation. |
| lib/vm/src/global.rs | Adds instance-backed global creation (VMGlobal::new_instance) writing into VMContext memory. |
| lib/types/src/vmoffsets.rs | Adds num_local_globals and changes local-global sizing/offset computations to inline definitions. |
| lib/types/src/serialize.rs | Bumps metadata ABI version for the breaking layout change. |
| lib/compiler/src/engine/tunables.rs | Creates VMContext-backed globals using allocator-provided definition locations; updates globals construction pipeline. |
| lib/compiler/src/engine/artifact.rs | Threads global definition locations from allocator into create_globals. |
| lib/compiler-singlepass/src/codegen.rs | Updates local-global addressing to direct VMContext memory instead of pointer indirection. |
| lib/compiler-llvm/src/translator/intrinsics.rs | Uses optional globals_base_ptr for faster local-global addressing when g0m0 opt is enabled. |
| lib/compiler-llvm/src/translator/code.rs | Adjusts g0m0 parameter strategy (globals base ptr + memory base ptr) and related call plumbing. |
| lib/compiler-llvm/src/trampoline/wasm.rs | Updates trampoline arg construction for new g0m0 scheme (no longer passes g0 value). |
| lib/compiler-llvm/src/abi/mod.rs | Removes g0 param handling; renames/adjusts memory base ptr param access. |
| lib/compiler-llvm/src/abi/x86_64_systemv.rs | Drops g0 param from local function signatures under g0m0 opt. |
| lib/compiler-llvm/src/abi/aarch64_systemv.rs | Drops g0 param from local function signatures under g0m0 opt. |
| lib/compiler-llvm/src/abi/riscv_systemv.rs | Drops g0 param from local function signatures under g0m0 opt. |
| lib/compiler-cranelift/src/func_environ.rs | Updates local-global addressing to use VMContext base + offset directly. |
| /// Allocate memory for just the globals of the current module, | ||
| /// with initializers applied. | ||
| #[allow(clippy::result_large_err)] | ||
| fn create_globals( | ||
| &self, | ||
| context: &mut StoreObjects, | ||
| module: &ModuleInfo, | ||
| vm_definition_locations: &[NonNull<VMGlobalDefinition>], | ||
| ) -> Result<PrimaryMap<LocalGlobalIndex, InternalStoreHandle<VMGlobal>>, LinkError> { |
There was a problem hiding this comment.
Tunables::create_globals is part of a public trait, and this change adds a new required parameter (vm_definition_locations). That is a breaking API change for any downstream code calling this helper; if maintaining semver compatibility is important here, consider keeping the old signature (e.g., as a wrapper that allocates host-owned globals) and introducing a new method for VMContext-backed globals, or otherwise ensure the crate’s versioning/changelog reflects the break.
There was a problem hiding this comment.
Yup, will need a new method!
There was a problem hiding this comment.
Technically yes, but I’d prefer to do a semantic version bump here. I’m a bit surprised that the low-level Tunables trait is part of the public API — are there actually downstream consumers relying on it? From my perspective, the top-level Artifact::instantiate API remains unchanged.
There was a problem hiding this comment.
Yes, for example: Edge relies on it for some customizations.
There was a problem hiding this comment.
Sure, but Edge is a little bit special in this area, it's a home use-case ;)
| #[allow(clippy::type_complexity)] | ||
| pub fn new( | ||
| module: &ModuleInfo, | ||
| ) -> ( | ||
| Self, | ||
| Vec<NonNull<VMMemoryDefinition>>, | ||
| Vec<NonNull<VMTableDefinition>>, | ||
| Vec<NonNull<VMGlobalDefinition>>, | ||
| ) { |
There was a problem hiding this comment.
InstanceAllocator::new is a public function and its return type changed (now returns an additional globals location vector). This is a breaking API change for downstream users; if API stability is a goal, consider introducing a new constructor or returning a struct with named fields to allow extension without breaking callers.
|
If this is safe, can we just drop the separation and always apply this? |
|
Globales can be exported from one module and imported from another. If changed in the first instance, the changes should be reflected on the second and third ones. Does this work well with the new strategy? |
Yes, absolutely! There's nothing fundamentally different - the only change is where is the memory blob for local (not imported) globals allocated. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
Before merging this one it would be great if we can get a bit of the performance analysis |
|
Sure, here they are. Note that we’re seeing a slight slowdown for pystone, but I believe it’s a worthwhile trade-off to ensure we’re operating in a safe space (i.e., applying a valid transformation), especially now that Python has started using shared libraries across multiple modules. Having a stack pointer being mapped to an argument is a risky business. |
|
What is the difference between Wasmer LLVM globals and Wasmer LLVM globals pass-params? Which one is this PR implementing? |
The former one is basically this PR, while the later include |
While investigating the
pass-params-optoptimization, I noticed that we can place local globals directly into the VMContext, allowing fast access to a global via the vmctx pointer.By doing this, we can simplify
pass-paramsto pass only a pointer to the global memory. This makes the optimization safe and preserves the performance benefit (we still pass one fewer argument).In a follow-up PR, I will rename the optimization and enable it by default. Plus, right now, we allocate 16B for all the global variables regardless of their type. We can do better!