Add support for generating perf maps for simple perf profiling#6030
Conversation
Subscribe to Label Actioncc @peterhuene DetailsThis issue or pull request has been labeled: "wasmtime:api", "wasmtime:c-api", "wasmtime:config", "wasmtime:docs"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Label Messager: wasmtime:configIt looks like you are changing Wasmtime's configuration options. Make sure to
DetailsTo modify this label's message, edit the To add new label messages or remove existing label messages, edit the |
alexcrichton
left a comment
There was a problem hiding this comment.
Whoa I had no idea this was a supported format by perf! Is there some online documentation for this? Or is there a reason to use jitdump over this?
I've got some minor comments but otherwise this seems reasonable to me to add.
Jitdump can handle code memory being reused by multiple functions at different times. In addition it allows perf to show the actual instructions that executed rather than just the function names. It also supports recording unwind tables as opposed to requiring frame pointers for unwinding. And finally it allows recording line tables.
https://github.com/torvalds/linux/blob/master/tools/perf/Documentation/jit-interface.txt |
|
I think the performance issues of the jitdump support are because we emit a record for each individual function at instead of emitting a single record covering the entire text section of the module as intended. |
Nice find @bjorn3! Although I'm not deep enough into the jitdump support to know how to fix this properly and quickly, and perfmap support is a great start for our use case. |
alexcrichton
left a comment
There was a problem hiding this comment.
Thanks for the info and the link @bjorn3! I'm not sure what I expected in the way of documentation but... that I suppose at least does what it says on the tin! Makes sense we'll definitely want to keep the jitdump interface for "more robust" usage.
instead of emitting a single record covering the entire text section of the module as intended.
Oh I didn't realize that it was possible to do this, but reading over the documentation it looks like JIT_CODE_LOAD only supports one symbol? Do you know off the top of your head how the whole text section would be loaded? I'd love to fix this myself to avoid generating thousands of tiny *.so files on all perf inject calls.
| for (idx, func, len) in module.trampolines() { | ||
| let (addr, len) = (func as usize as *const u8, len); | ||
| let name = format!("wasm::trampoline[{}]", idx.index()); | ||
| let _ = file.write_all(Self::make_line(&name, addr, len).as_bytes()); |
There was a problem hiding this comment.
Ah I missed this earlier, but I think we'll want ot do something other than ignoring errors here ideally. If an error happens it should probably "close" the file and terminate all future writing to it I suspect? (along with perhaps a warning message printed?)
There was a problem hiding this comment.
Good point! I've added log messages and early-returns. Looks like a BufWriter will automatically flush in its Drop impl, so we can avoid doing it explicitly in most early-returns.
| ) { | ||
| let mut file = PERFMAP_FILE.lock().unwrap(); | ||
| let file = file.as_mut().unwrap(); | ||
| let _ = file.write_all(Self::make_line(name, addr, size).as_bytes()); |
There was a problem hiding this comment.
I think this may want a flush at the end too?
Also with a BufWriter I think you could pass that into make_line to avoid the intermediate string allocation (e.g. write directly to the buffer). Although not required of course, that's ok to defer to if it's ever actually an issue.
There was a problem hiding this comment.
Turns out File is also a Write implementator, so could have make_line take &mut dyn Write and use it here without having a BufWriter (since there's only one write in this function, it didn't seem worth having an extra buffer).
Right, I should have read the docs more carefully. My bad. All we can do I guess is to make |
|
Wow, |
|
|
I should have addressed all the feedback, thanks for the review! |
This adds support for a simpler way to profile with
perf, using perf maps files. Whilejitdumpsupport is pretty effective and complete when you have access to DWARF metadata, theperf-injectstep may take lots of time, and it is not optimized at all when modules contain lots of functions. (In our embedding, I've stoppedperf-injectafter it ran for 5 hours and didn't complete, while creating 175K.soobjects.) Simplistic support for perf maps enable basic profiling: caller/callee trees, top/bottom time analysis, flamegraphs etc., and works very nicely when the wasm modules have many functions.I've also tweaked the documentation and tried to make the feature available in the C API too.
I haven't put this behind a feature toggle because this profiling agent does very little, so it's always compiled when running Linux. For consistency I could add one, if you prefer.