Multi-register value support: framework for Values wider than machine registers.#2538
Conversation
|
CI error seems unrelated (perhaps caused by recent Rust 1.49 release?): |
4aef54e to
6f07907
Compare
bnjbvr
left a comment
There was a problem hiding this comment.
Thanks, this is a lot of work! Quite mechanical in the lower.rs files, which is satisfying.
I'm a tiny bit worried about the overhead (compile-time) cost this will add to the "general" case, where we're in fact using one real reg for each SSA value (i.e. not using i128 on x64, etc.). In this case, it seems there will be a lot of additional loops over a ValueRegs of size 1. Maybe the branch predictors will be happy to deal with those, but the loop setups themselves could incur a visible cost too. Instead of me wild-guessing: do you have any rough performance numbers (esp. count of instructions, branch cache misses) before/after this patch, for the single-realreg-per-value case?
Related to one proposal/question I formulated below, and if the performance overhead justifies it, here's a design question: instead of having all the machinery be constructed on top of ValueRegs, could we split helpers in two ways:
- have one variant for ValueRegs, specially used to handle the multi-regs-per-value case
- have one variant for single RealRegs, since this is likely to be a frequent case?
This means we'd need then to consider single regs vs multi-regs-per-value differently during lowering, but my understanding is that this should be the case already. E.g. for add i128, we'd use a different open-coded sequence of instructions that the non-i128 add wouldn't use. Thoughts?
If the above suggestion doesn't make sense, and the overhead is negligible or if it can be countered with minimal effort: approving as is. Otherwise, if it makes sense or the PR significantly changes, I'd be happy to take another look! Thanks.
| let tmp1 = ctx.alloc_tmp(RegClass::I64, types::I64); | ||
| let tmp2 = ctx.alloc_tmp(RegClass::I64, types::I64); | ||
| let cst = ctx.alloc_tmp(RegClass::I64, types::I64); | ||
| let tmp1 = ctx.alloc_tmp(types::I64).only_reg().unwrap(); |
There was a problem hiding this comment.
Design thought: would it make sense to have a alloc_singlereg_tmp helper that avoids the constructing of an Option and unwrapping it, and returns only a single reg without using the ValueRegs construct? This line being repeated so many times makes me wary of the penalty cost of this multi-regs-per-value machinery for the general case where it's unused...
ea25756 to
d67876e
Compare
|
Thanks for the detailed review! One top-level point about special-casing the one-reg case vs. generalizing with In other words, it seems to be in the measurement noise. I sort-of expected this given that (on the 64-bit platform case) we turned a 32-bit Given that tiny/no overhead, I would strongly prefer to not special-case the one-register case, as it adds significant complexity; IMHO we're starting to get a slight case of "too much nested-if disease" in some places and I'd really prefer to keep just one code path where we can :-) Anyway, lots of updates here so PTAL again if you like! |
… regs. This will allow for support for `I128` values everywhere, and `I64` values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the machine backends to build such support; it just adds the framework for the MachInst backends to *reason* about a `Value` residing in more than one register.
d67876e to
6eea015
Compare
|
Just spoke with @bnjbvr and will go ahead and merge this with existing approval (thanks!). |
This will allow for support for
I128values everywhere, andI64values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to reason about a
Valueresiding in more thanone register.
This is a finalized version of the framework part of the draft PR (#2504); I128
operator implementations will come in a followup PR.