Multi-register value support: framework for Values wider than machine registers. by cfallin · Pull Request #2538 · bytecodealliance/wasmtime

cfallin · 2021-01-04T06:06:06Z

This will allow for support for I128 values everywhere, and I64
values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to reason about a Value residing in more than
one register.

This is a finalized version of the framework part of the draft PR (#2504); I128
operator implementations will come in a followup PR.

cfallin · 2021-01-04T06:31:46Z

CI error seems unrelated (perhaps caused by recent Rust 1.49 release?):

 note: rust-lld: error while loading shared libraries: libLLVM-11-rust-1.49.0-stable.so: cannot open shared object file: No such file or directory

bnjbvr

Thanks, this is a lot of work! Quite mechanical in the lower.rs files, which is satisfying.
I'm a tiny bit worried about the overhead (compile-time) cost this will add to the "general" case, where we're in fact using one real reg for each SSA value (i.e. not using i128 on x64, etc.). In this case, it seems there will be a lot of additional loops over a ValueRegs of size 1. Maybe the branch predictors will be happy to deal with those, but the loop setups themselves could incur a visible cost too. Instead of me wild-guessing: do you have any rough performance numbers (esp. count of instructions, branch cache misses) before/after this patch, for the single-realreg-per-value case?

Related to one proposal/question I formulated below, and if the performance overhead justifies it, here's a design question: instead of having all the machinery be constructed on top of ValueRegs, could we split helpers in two ways:

have one variant for ValueRegs, specially used to handle the multi-regs-per-value case
have one variant for single RealRegs, since this is likely to be a frequent case?

This means we'd need then to consider single regs vs multi-regs-per-value differently during lowering, but my understanding is that this should be the case already. E.g. for add i128, we'd use a different open-coded sequence of instructions that the non-i128 add wouldn't use. Thoughts?

If the above suggestion doesn't make sense, and the overhead is negligible or if it can be countered with minimal effort: approving as is. Otherwise, if it makes sense or the PR significantly changes, I'd be happy to take another look! Thanks.

bnjbvr · 2021-01-05T15:15:45Z

-                let tmp1 = ctx.alloc_tmp(RegClass::I64, types::I64);
-                let tmp2 = ctx.alloc_tmp(RegClass::I64, types::I64);
-                let cst = ctx.alloc_tmp(RegClass::I64, types::I64);
+                let tmp1 = ctx.alloc_tmp(types::I64).only_reg().unwrap();


Design thought: would it make sense to have a alloc_singlereg_tmp helper that avoids the constructing of an Option and unwrapping it, and returns only a single reg without using the ValueRegs construct? This line being repeated so many times makes me wary of the penalty cost of this multi-regs-per-value machinery for the general case where it's unused...

cfallin · 2021-01-06T01:43:23Z

Thanks for the detailed review!

One top-level point about special-casing the one-reg case vs. generalizing with ValueRegs everywhere: I did a quick measurement of compilation instruction count before/after this PR (clif-util wasm --target x86_64 bz2.wasm) as measured by perf stat:

Before:
     1,960,194,985      instructions              #    1.99  insn per cycle
     1,956,941,087      instructions              #    1.98  insn per cycle
     1,944,661,945      instructions              #    1.95  insn per cycle

After:
     1,951,673,951      instructions              #    1.98  insn per cycle
     1,951,074,339      instructions              #    1.98  insn per cycle
     1,945,951,877      instructions              #    1.98  insn per cycle

In other words, it seems to be in the measurement noise. I sort-of expected this given that (on the 64-bit platform case) we turned a 32-bit Reg into a 64-bit (Reg, Reg) (basically) with a bunch of nonzero checks of the top half and almost-never-taken branches.

Given that tiny/no overhead, I would strongly prefer to not special-case the one-register case, as it adds significant complexity; IMHO we're starting to get a slight case of "too much nested-if disease" in some places and I'd really prefer to keep just one code path where we can :-)

Anyway, lots of updates here so PTAL again if you like!

… regs. This will allow for support for `I128` values everywhere, and `I64` values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the machine backends to build such support; it just adds the framework for the MachInst backends to *reason* about a `Value` residing in more than one register.

cfallin · 2021-01-06T17:58:36Z

Just spoke with @bnjbvr and will go ahead and merge this with existing approval (thanks!).

cfallin requested review from bnjbvr and julian-seward1 January 4, 2021 06:06

This was referenced Jan 4, 2021

Support for I128 operations in x64 backend. #2539

Merged

Add ELF TLS support in new x64 backend. #2540

Merged

x64 and aarch64: allow StructArgument and StructReturn args. #2541

Merged

cfallin mentioned this pull request Jan 4, 2021

Draft: I128 support (partial) on x64. #2504

Closed

bjorn3 mentioned this pull request Jan 4, 2021

Implement 128bit legalizations for AArch64 #1553

Closed

bjorn3 reviewed Jan 4, 2021

View reviewed changes

Comment thread cranelift/codegen/src/machinst/valueregs.rs Outdated

cfallin force-pushed the multi-reg-framework branch 2 times, most recently from 4aef54e to 6f07907 Compare January 4, 2021 20:59

bjorn3 approved these changes Jan 4, 2021

View reviewed changes

bnjbvr approved these changes Jan 5, 2021

View reviewed changes

cfallin force-pushed the multi-reg-framework branch 2 times, most recently from ea25756 to d67876e Compare January 6, 2021 01:30

cfallin force-pushed the multi-reg-framework branch from d67876e to 6eea015 Compare January 6, 2021 01:45

cfallin merged commit f579d08 into bytecodealliance:main Jan 6, 2021

cfallin deleted the multi-reg-framework branch January 6, 2021 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi-register value support: framework for Values wider than machine registers.#2538

Multi-register value support: framework for Values wider than machine registers.#2538
cfallin merged 1 commit into
bytecodealliance:mainfrom
cfallin:multi-reg-framework

cfallin commented Jan 4, 2021

Uh oh!

cfallin commented Jan 4, 2021

Uh oh!

Uh oh!

bnjbvr left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bnjbvr Jan 5, 2021

Uh oh!

Uh oh!

Uh oh!

cfallin commented Jan 6, 2021

Uh oh!

cfallin commented Jan 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cfallin commented Jan 4, 2021

Uh oh!

cfallin commented Jan 4, 2021

Uh oh!

Uh oh!

bnjbvr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bnjbvr Jan 5, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

cfallin commented Jan 6, 2021

Uh oh!

cfallin commented Jan 6, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants