cranelift: Optimize __multi3-style multiplications#8653
Conversation
LLVM's `__multi3` function works by splitting a wide multiplication into several narrower ones. This optimization recognizes the algebraic identities involved and merges them back into the original wide multiply. This is not yet done but illustrates how part of the optimization can work, at least. Currently, the lower half of the result is optimized into a single `imul` instruction, but most of the intermediate values that are optimized away there are still used in computing the upper half, so elaboration brings them back later. Fixes bytecodealliance#4077
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
|
I don't know if it'd help at all, but cc https://github.com/bytecodealliance/wasmtime/pull/7719/files#diff-2041f67049d5ac3d8f62ea91d3cb45cdb8608d5f5cdab988731ae2addf90ef01 which will convert larger |
Yeah, this definitely felt similar to that, so I placed my new rules next to those Thinking about this comment, though, led me to realize that once we can do that, ISLE is capable enough today to transform a sequence like this: into something like this: If our mid-end rules supported producing instructions with multiple results, the On x86 at least, an |
It might not be an improvement by itself, but if we recognize some other i128 ops we can probably start applying some optimization rules at the i128 level which would be really neat. For example, we could now recognize |
|
Good point! Except we currently can't deal with 128-bit constants, so rules like The other thing we can't currently do is notice that this sequence has a redundant multiply: |
|
Since there's no 128-bit |
Ooh, I hadn't looked at those carefully. That's awesome that |
|
For extend we don't need to wait on wasmtime/cranelift/codegen/src/prelude_opt.isle Lines 113 to 116 in f1fe2af which is a special extractor added in #7710 for exactly the "there might be an (Though I'd love to have or patterns to stop needing custom extractors for things like this!) EDIT: landed in #8686 |
|
This is a relatively old PR at this point and with the wasm wide-arithmetic proposal in the works that should in theory supplant the need for this in the long-term. If folks are interested in pushing on this in the nearer-term though I'm happy to reopen. |
LLVM's
__multi3function works by splitting a wide multiplication into several narrower ones. This optimization recognizes the algebraic identities involved and merges them back into the original wide multiply.This is not yet done but illustrates how part of the optimization can work, at least.
Currently, the lower half of the result is optimized into a single
imulinstruction, but most of the intermediate values that are optimized away there are still used in computing the upper half, so elaboration brings them back later.Fixes #4077