cranelift: Optimize select_spectre_guard, carefully#8139
Conversation
|
Perhaps @cfallin should review this? I was talking about it with him yesterday. |
|
Sure, happy to take a look. |
cfallin
left a comment
There was a problem hiding this comment.
This rewrite looks good to me; thanks for thinking through and discussing this yesterday!
A few things I think we'll want:
- Could we have an egraph filetest showing each case?
- I think it would be good to document (or update documentation about) the semantics of
select_spectre_guardand the corresponding justification, a little more than the comments here. In particular:- The instruction definition's description still says the op is guaranteed not to be removed. That's not true anymore (in fact that's the point of this PR!); can we update to say something about not picking the wrong input even under a mis-speculated branch (as you do in the comment in the ISLE rewrites)?
- Let's note in the ISLE rewrites that removing
select_spectre_guard, if semantically correct for the functional result, always meets this speculation condition: by removing an op we are only ever going to remove computation that could be speculated, we're never adding new speculation. - Finally, can we add a comment in around the instruction description noting that the op is pure because our reasoning about speculation safety tells us it's always safe to remove if functionally correct, and safe to merge with another functionally identical op.
Thanks!
Subscribe to Label ActionDetailsThis issue or pull request has been labeled: "cranelift", "cranelift:meta", "isle"Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when SPECTRE mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, bytecodealliance#8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement bytecodealliance#8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
Currently, every access to a table element does a bounds-check with a conditional branch to a block that explicitly traps. Instead, when Spectre mitigations are enabled, let's change the address computation to return a null pointer for out-of-bounds accesses, and then allow the subsequent load or store to trap. This is less code in that case since we can reuse instructions we needed anyway. We return the MemFlags that the memory access should use, in addition to the address it should access. That way we don't record trap metadata on memory access instructions which can't actually trap due to being preceded by a `trapnz`-based bounds check, when Spectre mitigations are disabled. In addition, when the table has constant size and the element index is a constant and mid-end optimization is enabled, this allows the bounds-check to be constant folded away. Later, #8139 will let us optimize away the select_spectre_guard instruction in this case too. Once we also implement #8160, `tests/disas/typed-funcrefs.wat` should be almost as fast as native indirect function calls.
This commit makes two changes to our treatment of `select_spectre_guard`. First, stop annotating this instruction as having any side effects. We only care that if its value result is used, then it's computed without branching on the condition input. We don't otherwise care when the value is computed, or if it's computed at all. Second, introduce some carefully selected ISLE egraph rewrites for this instruction. These particular rewrites are those where we can statically determine which SSA value will be the result of the instruction. Since there is no actual choice involved, there's no way to accidentally introduce speculation on the condition input.
6281c5c to
e70e51b
Compare
|
I think I've addressed all your comments about documentation, Chris, and I'd appreciate feedback on that aspect. I still need to write filetests exercising these new rules, but at least now we can see that this has a positive effect on the typed-funcref tests, as expected. |
cfallin
left a comment
There was a problem hiding this comment.
New doc comments look great -- thanks!
This commit makes two changes to our treatment of
select_spectre_guard.First, stop annotating this instruction as having any side effects. We only care that if its value result is used, then it's computed without branching on the condition input. We don't otherwise care when the value is computed, or if it's computed at all.
Second, introduce some carefully selected ISLE egraph rewrites for this instruction. These particular rewrites are those where we can statically determine which SSA value will be the result of the instruction. Since there is no actual choice involved, there's no way to accidentally introduce a branch on the condition input.