Don't re-capture backtraces when propagating traps through host frames#5049
Conversation
…he stack when we trap
This fixes some accidentally quadratic code where we would re-capture a Wasm stack trace (takes `O(n)` time) every time we propagated a trap through a host frame back to Wasm (can happen `O(n)` times). And `O(n) * O(n) = O(n^2)`, of course. Whoops. After this commit, it trapping with a call stack that is `n` frames deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and is therefore just a proper `O(n)` time operation, as it is intended to be. Now we explicitly track whether we need to capture a Wasm backtrace or not when raising a trap. This unfortunately isn't as straightforward as one might hope, however, because of the split between `wasmtime::Trap` and `wasmtime_runtime::Trap`. We need to decide whether or not to capture a Wasm backtrace inside `wasmtime_runtime` but in order to determine whether to do that or not we need to reflect on the `anyhow::Error` and see if it is a `wasmtime::Trap` that already has a backtrace or not. This can't be done the straightforward way because it would introduce a cyclic dependency between the `wasmtime` and `wasmtime-runtime` crates. We can't merge those two `Trap` types-- at least not without effectively merging the whole `wasmtime` and `wasmtime-runtime` crates together, which would be a good idea in a perfect world but would be a *ton* of ocean boiling from where we currently are -- because `wasmtime::Trap` does symbolication of stack traces which relies on module registration information data that resides inside the `wasmtime` crate and therefore can't be moved into `wasmtime-runtime`. We resolve this problem by adding a boolean to `wasmtime_runtime::raise_user_trap` that controls whether we should capture a Wasm backtrace or not, and then determine whether we need a backtrace or not at each of that function's call sites, which are in `wasmtime` and therefore can do the reflection to determine whether the user trap already has a backtrace or not. Phew! Fixes bytecodealliance#5037
|
Difference in benchmark results after the third commit: Details |
alexcrichton
left a comment
There was a problem hiding this comment.
Nice!
@Stebalien would you be up for testing this against the reproducer from #5037 and see if it resolves the issue? The quadratic behavior I think should be gone but there may be lurking inefficiences to still handle in the linear work to capture a backtrace so I think it'd be good to confirm one way or another.
Subscribe to Label Actioncc @peterhuene DetailsThis issue or pull request has been labeled: "wasmtime:api"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
Thanks for looking into this! Unfortunately, no difference. I tested it here: filecoin-project/ref-fvm@4242e72 (looks like I got the patches correct). I'll put a reproducer together by next week (have a few deadlines this week). |
Unfortunately we can't do
debug_assert_eq!(needs_backtrace, trap.inner.backtrace.get().is_some());
because `needs_backtrace` doesn't consider whether Wasm backtraces have been
disabled via config.
…user_trap` into one place
|
@Stebalien to be clear, enabling backtraces will always have some overhead that is proportional to how many frames are on the stack (capturing the stack is Are you seeing identical performance with this patch applied vs without? If so, how is the trap being triggered? Is it an (Reproducer would still be appreciated, once you have the cycles.) |
Within the margin of error, yes. Basically, this test is 300-400x slower with backtraces enabled.
The host is returning a
In other words:
And so on... Two steps backwards, one step forward, two steps backward, all the way to the top.
I'll also do a bit of profiling myself as well. |
To clarify, does this mean that for a stack of depth N, there are N traps? The "quadratic slowdown with backtraces" behavior is completely expected then, from first principles, I think: you're taking N traps, and they have cost 1, 2, 3, ... N frames/levels deep. I don't think there's anything we could do to make this not be quadratic, short of some sort of magical lazily-expanding constant-time stack capture (which we can't do because we don't keep the stack around after the trap fires). |
|
There's one trap at each level. If these stack traces cover the entire rust stack, not just the wasm module handling the trap, that would make sense. In that case, would the wasmtime team consider keeping the ability to disable backtraces? |
|
I think it would make sense to add an optional max number of frames captured in a backtrace, which can be set to zero to completely disable backtraces. Then we can limit the cost of a backtrace from O(frames) to some user-chosen acceptable constant overhead. Going to go ahead and merge this PR and file a new issue for that config option. I'll cc you on it, @Stebalien. |
|
This fixes some accidentally quadratic code where we would re-capture a Wasm
stack trace (takes
O(n)time) every time we propagated a trap through a hostframe back to Wasm (can happen
O(n)times). AndO(n) * O(n) = O(n^2), ofcourse. Whoops. After this commit, it trapping with a call stack that is
nframes deep of Wasm-to-host-to-Wasm calls just captures a single backtrace and
is therefore just a proper
O(n)time operation, as it is intended to be.Now we explicitly track whether we need to capture a Wasm backtrace or not when
raising a trap. This unfortunately isn't as straightforward as one might hope,
however, because of the split between
wasmtime::Trapandwasmtime_runtime::Trap. We need to decide whether or not to capture a Wasmbacktrace inside
wasmtime_runtimebut in order to determine whether to do thator not we need to reflect on the
anyhow::Errorand see if it is awasmtime::Trapthat already has a backtrace or not. This can't be done thestraightforward way because it would introduce a cyclic dependency between the
wasmtimeandwasmtime-runtimecrates. We can't merge those twoTraptypes-- at least not without effectively merging the whole
wasmtimeandwasmtime-runtimecrates together, which would be a good idea in a perfectworld but would be a ton of ocean boiling from where we currently are --
because
wasmtime::Trapdoes symbolication of stack traces which relies onmodule registration information data that resides inside the
wasmtimecrateand therefore can't be moved into
wasmtime-runtime. We resolve this problem byadding a boolean to
wasmtime_runtime::raise_user_trapthat controls whether weshould capture a Wasm backtrace or not, and then determine whether we need a
backtrace or not at each of that function's call sites, which are in
wasmtimeand therefore can do the reflection to determine whether the user trap already
has a backtrace or not. Phew!
Fixes #5037