Speed up registration of stack maps#2820
Conversation
Subscribe to Label Actioncc @peterhuene DetailsThis issue or pull request has been labeled: "wasmtime:api"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
| } | ||
|
|
||
| /// A trait used to abstract how stack maps are located. | ||
| pub trait StackMapLookup { |
There was a problem hiding this comment.
Question: why the trait over using a method on CompiledModule (similar to func_by_pc and func_info) and storing Arc<CompiledModule> vs the trait object?
There was a problem hiding this comment.
Only because the CompiledModule lives in the wasmtime_jit crate which this wasmtime_runtime crate doesn't have access to. Otherwise though there's definitely no need for dynamic trait dispatch here.
There was a problem hiding this comment.
Makes sense. I guess the alternative would be to use StackMapLookup via the VMContext, rather than a registry pointer, freeing the registry implementation to live in the wasmtime crate, but that's just trading dynamic-dispatch during module instantiation to dynamic-dispatch at externref gc.
There was a problem hiding this comment.
If we did move the registry implementation to wasmtime, I think this data structure could be a consolidated one replacing StoreFrameInfo too, basically just a "which CompiledModule should I be talking to given this pc?" which responds with an answer that we then use to lookup the appropriate information (gimmie frame info, gimmie trap info, gimmie stack maps, etc.)
There was a problem hiding this comment.
In fact, given externref gc being a relatively expensive (and hopefully very infrequent) operation and the cost of doing trait object dispatch per frame would be negligible, I would argue VMContext should have *const dyn StackMapLookup rather than *mut StackMapRegistry.
This frees us to remove StackMapRegistry and StoreFrameInfo in favor of a single CompiledModuleRegistry per store that lives in wasmtime which impls the runtime's StackMapLookup and can lookup frame and trap information as well.
Thoughts on this approach? Perhaps for another PR?
There was a problem hiding this comment.
Hm good points! I'll look to add this to this PR, seems reasonable to do here!
This commit fixes a perf issue I was seeing in some local benchmarking where registration of stack maps took a nontrivial amount of time for a module that didn't even use stack maps! The fix here is to largely just do the same thing as bytecodealliance#2811 which is to use the in-memory data structures of a `CompiledModule` rather than rebuilding different data structures when a module is registered with a `Store`. The `StackMapRegistry` type now takes a lookup object for a range of pcs, simplifying registration. Registration additionally doesn't even happen if a module is pre-determined to not have any stack maps inside of it. This trait object encapsulates all the logic that was previously used in the rebuilt data structures and also leverages conveniences like `CompiledModule::func_by_pc` added recently as well. With this all combined it means that modules which don't have stack maps now skip this registration step entirely and modules with stack maps should do much less work during registration as well.
4001e35 to
9f1aa64
Compare
|
This looks well handled by #2842, so closing in favor of that! |
This commit fixes a perf issue I was seeing in some local benchmarking
where registration of stack maps took a nontrivial amount of time for a
module that didn't even use stack maps! The fix here is to largely just
do the same thing as #2811 which is to use the in-memory data structures
of a
CompiledModulerather than rebuilding different data structureswhen a module is registered with a
Store.The
StackMapRegistrytype now takes a lookup object for a range ofpcs, simplifying registration. Registration additionally doesn't even
happen if a module is pre-determined to not have any stack maps inside
of it. This trait object encapsulates all the logic that was previously
used in the rebuilt data structures and also leverages conveniences like
CompiledModule::func_by_pcadded recently as well.With this all combined it means that modules which don't have stack maps
now skip this registration step entirely and modules with stack maps
should do much less work during registration as well.