Implement the relaxed SIMD proposal by alexcrichton · Pull Request #5892 · bytecodealliance/wasmtime

alexcrichton · 2023-02-27T23:11:44Z

This commit is an implementation of the [relaxed SIMD proposal][proposal] which is expected to soon move to phase 4. This is a relatively small proposal which adds 20 new instructions operating over the preexisting v128 type. The purpose of this proposal is to expose instructions that are not guaranteed to be deterministic across host platforms. For example the instructions added have different behavior for some inputs on x86_64 and aarch64. I believe the intention is that many real-world use cases for these instructions have inputs which "avoid the edge cases" to have the same behavior across platforms.

Additionally as part of the specification of the relaxed SIMD proposal, however, there are annotated deterministic semantics for each instruction. In other words while each instruction is allowed to have different semantics across platforms there is still one listed canonical set of semantics. This commit implements this knobs as a Config option in Wasmtime. By default all relaxed simd instructions use their host-defined semantics (the most efficient ones) but a new Config::relaxed_simd_deterministic knob can be used to alter the codegen and force deterministic semantics.

An example of this is that the f32x4.relaxed_madd instruction, a fused multiply and add instruction, can either be a fused operation or separate operations. By default Wasmtime will select the most efficient one, namely using two operations on x86_64 when the fma feature isn't available. If the relaxed_simd_deterministic option is enabled, however, then regardless of whether fma is available the fused semantics will be implemented as they're the deterministic semantics for this instruction.

This commit includes an implementation of these instructions for the x64 and the aarch64 backend. The aarch64 implementation ended up being trivial since all of its instructions match the deterministic semantics of the proposal. Only support for encoding the fmls instruction was added since it wasn't already implemented. On x86_64 the implementation is much more subtle and exploits various edge cases in the proposal (leading me to the conclusion that the relaxed simd proposal is basically the "let's acknowledge the nonstandard, but dominant, semantics of x86_64 proposal").

Implementation-wise this commit adds a number of new instructions to Cranelift all prefixed with arch_*. I've tried to document that the intended equivalence to a deterministic instruction and additionally document the edge cases (currently they've all got edge cases for x86_64 and none for aarch64, which I believe represents the current state of things). There may be better methods to do this, however, since this strategy does not seems scalable into the future if dozens-to-hundreds more instructions get proposed and implemented.

There are a number of more optimal AVX512-based lowerings for the x86_64 backend for these instructions, but I haven't implemented any of them in this PR. All the support for x86_64 is based on AVX and below.

Procedurally the spec tests were copied from the upstream proposal into the Wasmtime repository here since the WebAssembly/testsuite repo is somewhat inactive right now and it was the easiest way to integrate them. When the upstream repository is updated the local copies here can be deleted in favor of the upstream implementations.

Finally I have not done rigorous testing of this PR yet. I've tried to be meticulous with rationalizing the spec-defined semantics of these wasm instructions against the architecture-specific behavior of each instruction. This has resulted in a number of PRs to the upstream specification to align with the intended instructions or clarify some certain lowerings. For example x86_64's lowering of i16x8.relaxed_laneselect is not correct with respect to the current specification but there is an issue open for this upstream. There is unfortunately no great way to fuzz these instructions inherently due to their nondeterministic semantics. Without a differential fuzzer all we can really do is verify that they compile which doesn't really provide much of a guarantee. My hope is that this can be considered more carefully before any decision to enable this proposal by default.

cfallin · 2023-02-27T23:30:48Z

This is a substantial bit of work (regardless of the "small" proposal) -- thanks for this!

I have some thoughts on the Cranelift side below. To preface these thoughts: this is a good change, and we need to support this proposal. However, determinism has been a significant discussion point in CLIF and has significant implications for testing and verification, so I want to suggest that we might consider some tweaks to the approach. So:

Implementation-wise this commit adds a number of new instructions to Cranelift all prefixed with arch_*. I've tried to document that the intended equivalence to a deterministic instruction and additionally document the edge cases (currently they've all got edge cases for x86_64 and none for aarch64, which I believe represents the current state of things). There may be better methods to do this, however, since this strategy does not seems scalable into the future if dozens-to-hundreds more instructions get proposed and implemented.

Skimming the added instructions' definitions, my main concern is basically a mirror of the determinism concern at the Wasm standards discussion level. In CLIF, we've had a lot of discussion around determism as it relates to endianness (#3369 and subsequent discussions and PRs), which became especially relevant as we built the CLIF interpreter and started to fuzz against it. If we let CLIF semantics be partly defined by architecture-specific details, then we can't possibly define an interpreter behavior that matches all backends. Our solution in the endianness case was to reify endian as an explicit detail in the IR, and let loads/stores be tagged as such. Ultimately in that case one can support big+little endian on all machines with the help of bswaps (and s390x actually does, so Wasm lowerings work), but that's a separate and additional step no different from any other "support operand combinations" enhancement.

This is important for verification efforts too: we shouldn't have CLIF whose meaning depends on the compilation target. We could adopt a refinement semantics approach ("result may be one of the values in this set" and then compiled code refines those sets) but that's harder to reason about in general.

I think a similar approach as we had with endianness might work here: define a Cranelift setting that is something like "arch_* behavior", or more granular settings for each ("insertlane behavior", ...), set them as appropriate for the expected target, and implement the appropriate lowerings (and all lowerings in the interpreter).

There are some issues with that -- namely, we now need to plumb "expected native settings for target T" up to the CLIF producer, and the actual CLIF will now differ on x86-64 and aarch64 for programs that use these opcodes -- but the semantics are at least defined fully and machine-independently for any CLIF.

Thoughts? Also, the Cranelift meeting on Wednesday would probably be a good venue to get a wider set of thoughts on this :-)

cfallin · 2023-02-27T23:35:07Z

(An addendum to the above: in the endianness case we have a third endian, "native", which lets us avoid the plumbing problem of "which endian is faster / even supported" in the CLIF producer; but IMHO that's a bit of a wart, which I'd like to remove eventually for all of the reasons above, to fully capture the meaning in the IR. So I'd prefer we bite the bullet and plumb this through with settings and "machine info" of some sort.)

jameysharp · 2023-02-27T23:43:16Z

If we want to offer the option of using the deterministic semantics for the relaxed SIMD proposal anyway, what if we start by shipping only that version? Then we can support the new wasm instructions now, while taking some more time to think through how best to expose architecture-specific instructions in Cranelift.

github-actions · 2023-02-27T23:45:25Z

Subscribe to Label Action

cc @peterhuene

Details

This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:x64", "cranelift:meta", "cranelift:wasm", "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

peterhuene: wasmtime:api

To subscribe or unsubscribe from this label, edit the .github/subscribe-to-label.json configuration file.

Learn more.

github-actions · 2023-02-27T23:45:39Z

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

If you added a new Config method, you wrote extensive documentation for
it.

Details

Our documentation should be of the following form:

Short, simple summary sentence.

More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.

Is this method fallible? If so, when can it return an error?

Can this method panic? If so, when does it panic?

# Example

Optional example here.

If you added a new Config method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.

Details

For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.

Often, all that is required of you is to ensure that there is a knob for this
configuration option in wasmtime_fuzzing::Config (or one
of its nested structs).

Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See our docs on fuzzing for more details.
If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.

Details

To modify this label's message, edit the .github/label-messager/wasmtime-config.md file.

To add new label messages or remove existing label messages, edit the
.github/label-messager.json configuration file.

Learn more.

alexcrichton · 2023-02-27T23:54:14Z

Definitely happy to discuss more at a Cranelift meeting! Sorry should have clarified that in the PR description, I figured the precise method of implementing this would be contentious and want to emphasize that this current iteration is just one point in the design space and I'm happy to adjust as preferred. My motivation for working on this was I wanted to add the instructions to the Rust standard library, but doing that required having a runtime that implements them, and since Wasmtime is currently used to validate simd intrinsics for Rust I figured I may as well give it a stab to implement here. In that sense I'm in no rush to get something in, and I would prefer to not land anything unless there's a path forward to getting the fast lowerings of these instructions since I suspect it's not really that useful of a proposal if only the deterministic lowerings are offered.

One alternative is to basically just implement new x86_* instructions for CLIF which aren't implemented for any backend other than x86 (unless someone really really wants to do that) and are defined in terms of the actual x86 instructions. I don't know how scalable this is though in the sense that this PR only implements AArch64 and x86 lowerings for the relaxed simd instructions and while I don't believe any aarch64_* instructions would be necessary I don't know if CLIF would need to grow riscv64_* or s390x_* instructions.

Another alternative would be to take inspiration which has intrinsics of the form @llvm.x86.* where any backend could have any number of intrinsics which are well-defined for each backend. CLIF AFAIK isn't really set up for this right now, though, but would provide more easy extensibility while retaining "everything can in theory be implemented on every platform for every other platform simultaneously".

These alternatives, as noted, do indeed lift the complexity into the producer but that's not really the end of the world if the main producer here for these instructions is Wasmtime.

cfallin · 2023-02-28T00:42:20Z

One alternative is to basically just implement new x86_* instructions for CLIF which aren't implemented for any backend other than x86 (unless someone really really wants to do that) and are defined in terms of the actual x86 instructions. I don't know how scalable this is though in the sense that this PR only implements AArch64 and x86 lowerings for the relaxed simd instructions and while I don't believe any aarch64_* instructions would be necessary I don't know if CLIF would need to grow riscv64_* or s390x_* instructions.

I actually kind of like this too, weirdly enough: it's dead-simple, and it doesn't create instructions whose semantics are "parameterized" on a Cranelift setting. If anyone were perplexed by "x86" appearing in a machine-independent IR, we could expand it to x86like_insertlane or whatever -- the ISA name is the inspiration for the semantics but the semantics are machine-independent once we define this.

jameysharp · 2023-02-28T01:23:37Z

A WebAssembly module can't select which code path to use based on which wasm extensions are available in the host, right? In that case I think there is value in having any implementation at all of the relaxed SIMD instructions, even if for some reason we never get around to making them fast on x86, so we can at least run modules that use them. I also think it's good to merge the uncontroversial parts of this first while we sort out the path to making it fast, so later we're iterating on a smaller PR.

alexcrichton · 2023-02-28T23:32:59Z

I've split out #5895, #5896, and #5897, rebased this PR on top of those, and then split the bulk into two commits -- the first adds all the scaffolding/support/deterministic lowerings and the second adds all the x86-specific goop. I went ahead and switched to x86_* style instructions in CLIF, but I'd still like to discuss this in tomorrow's Cranelift meeting.

abrown

Makes sense to me; as I mentioned today it is unfortunate that there is any special handling needed for x86 (i.e., is_x86()) but it's not really clear what else can be done. Nice work pulling this together, @alexcrichton! I had some suggestions that may or may not be necessary, and it might be good for someone else to put eyes on this as well.

abrown · 2023-03-02T00:34:37Z

+        Operator::I16x8DotI8x16I7x16S => {
+            let (a, b) = pop2_with_bitcast(state, I8X16, builder);
+            state.push1(
+                if environ.relaxed_simd_deterministic() || !environ.is_x86() {


Maybe? (same below):

Suggested change

if environ.relaxed_simd_deterministic() || !environ.is_x86() {

if !environ.relaxed_simd_deterministic() || !environ.is_x86() {

This is currently intentional where if determinism is required it takes the first branch which has the deterministic lowering in terms of CLIF instruction semantics. The branch isn't taken though for x86 when speed is desired b/c x86 has a faster lowering.

If you have a suggestion though to restructure this to more obviously communicate this intent it's probably good to implement though since the current if is pretty subtle and I think easy to lose track of.

Ok, makes sense. In terms of communication, I guess the missing piece of information for me was that I couldn't remember off the top of my head which lowering had the deterministic semantics. Now, obviously, I should have trusted that the if environ.relaxed_simd_deterministic() side of the branch was the deterministic lowering but in this review I was starting to wonder which lowering was which. This may just be a review issue, but if you wanted to make it extra clear the deterministic lowering could get a comment like: // This is the deterministic lowering for ..., see https://github.com/WebAssembly/relaxed-simd/....

I think the spec isn't quite in the state where there's a URL to link to unfortunately. I don't think that the deterministic semantics are actually written down anywhere other than the spec draft currently. I'll try to remember to update things here though once that's update.

Otherwise though I went ahead and added comments to all the deterministic cases at least english-explaining what the intention was.

alexcrichton · 2023-03-06T15:19:00Z

@cfallin or @jameysharp did y'all want to take another pass over this? If not I can go ahead and hit the button too

cfallin

LGTM on the Cranelift bits; I'm happy with this approach. Thanks!

jameysharp

I'm happy with this, with a few small details fixed. Thanks!

jameysharp · 2023-03-06T22:30:12Z

+                       (v128.const i16x8 -32768 -32767 32767 0 0 0 0 0)
+                       (v128.const i16x8 -32768 -32768 32767 0 0 0 0 0))
+               ;; overflows, return either INT16_MIN or INT16_MAX
+               (v128.const i16x8 -1 -1 -1 -1 -1 -1 -1 -1))


This comment seems wrong since only one possible outcome is asserted.

Ah this is just copied from the upstream spec which I think is just a copy/paste artifact, but if you'd like I think they're pretty accepting of PRs.

This commit adds initial scaffolding and support for the Relaxed SIMD proposal for WebAssembly. Codegen support is supported on the x64 and AArch64 backends on this time. The purpose of this commit is to get all the boilerplate out of the way in terms of plumbing through a new feature, adding tests, etc. The tests are copied from the upstream repository at this time while the WebAssembly/testsuite repository hasn't been updated. A summary of changes made in this commit are: * Lowerings for all relaxed simd opcodes have been added, currently all exhibiting deterministic behavior. This means that few lowerings are optimal on the x86 backend, but on the AArch64 backend, for example, all lowerings should be optimal. * Support is added to codegen to, eventually, conditionally generate different code based on input codegen flags. This is intended to enable codegen to more efficient instructions on x86 by default, for example, while still allowing embedders to force architecture-independent semantics and behavior. One good example of this is the `f32x4.relaxed_fmadd` instruction which when deterministic forces the `fma` instruction, but otherwise if the backend doesn't have support for `fma` then intermediate operations are performed instead. * Lowerings of `iadd_pairwise` for `i16x8` and `i32x4` were added to the x86 backend as they're now exercised by the deterministic lowerings of relaxed simd instructions. * Sample codegen tests for added for x86 and aarch64 for some relaxed simd instructions. * Wasmtime embedder support for the relaxed-simd proposal and forcing determinism have been added to `Config` and the CLI. * Support has been added to the `*.wast` runtime execution for the `(either ...)` matcher used in the relaxed-simd proposal. * Tests for relaxed-simd are run both with a default `Engine` as well as a "force deterministic" `Engine` to test both configurations. * All tests from the upstream repository were copied into Wasmtime. These tests should be deleted when WebAssembly/testsuite is updated.

This commit builds on the prior commit and adds an array of `x86_*` instructions to Cranelift which have semantics that match their corresponding x86 equivalents. Translation for relaxed simd is then additionally updated to conditionally generate different CLIF for relaxed simd instructions depending on whether the target is x86 or not. This means that for AArch64 no changes are made but for x86 most relaxed instructions now lower to some x86-equivalent with slightly different semantics than the "deterministic" lowering.

This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn't specify the `has_fma` feature.

Co-authored-by: Andrew Brown <andrew.brown@intel.com>

alexcrichton force-pushed the relaxed-simd-maybe branch from 7d4c0bf to 8c81699 Compare February 28, 2023 23:31

alexcrichton force-pushed the relaxed-simd-maybe branch from 72cd658 to 94f85d0 Compare March 1, 2023 20:23

abrown approved these changes Mar 2, 2023

View reviewed changes

alexcrichton mentioned this pull request Mar 2, 2023

Remove local copies of the relaxed-simd test suite #5914

Closed

cfallin approved these changes Mar 6, 2023

View reviewed changes

jameysharp approved these changes Mar 6, 2023

View reviewed changes

alexcrichton and others added 7 commits March 7, 2023 07:10

Add libcall support for fma to Wasmtime

55a6cc7

This will be required to implement the `f32x4.relaxed_madd` instruction (and others) when an x86 host doesn't specify the `has_fma` feature.

Ignore relaxed-simd tests on s390x and riscv64

58bb252

Enable relaxed-simd tests on s390x

1428282

Update cranelift/codegen/meta/src/shared/instructions.rs

e06e4c6

Co-authored-by: Andrew Brown <andrew.brown@intel.com>

Add a FIXME from review

64d6d5d

alexcrichton added 3 commits March 7, 2023 07:10

Add notes about deterministic semantics

2473c6f

Don't default has_native_fma to true

faead4b

Review comments and rebase fixes

d72e6ee

alexcrichton force-pushed the relaxed-simd-maybe branch from 2bba800 to d72e6ee Compare March 7, 2023 15:19

alexcrichton enabled auto-merge March 7, 2023 15:20

alexcrichton added this pull request to the merge queue Mar 7, 2023

Merged via the queue into bytecodealliance:main with commit 8bb183f Mar 7, 2023

alexcrichton deleted the relaxed-simd-maybe branch March 7, 2023 16:33

martindevans mentioned this pull request Apr 27, 2023

Expose new c-api functions for configuring relaxed SIMD #6292

Merged

martindevans mentioned this pull request May 23, 2023

Added Relaxed SIMD bytecodealliance/wasmtime-dotnet#256

Merged

	if environ.relaxed_simd_deterministic() \|\| !environ.is_x86() {
	if !environ.relaxed_simd_deterministic() \|\| !environ.is_x86() {

Conversation

alexcrichton commented Feb 27, 2023

Uh oh!

cfallin commented Feb 27, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cfallin commented Feb 27, 2023

Uh oh!

jameysharp commented Feb 27, 2023

Uh oh!

github-actions Bot commented Feb 27, 2023

Subscribe to Label Action

Uh oh!

github-actions Bot commented Feb 27, 2023

Label Messager: wasmtime:config

Uh oh!

alexcrichton commented Feb 27, 2023

Uh oh!

cfallin commented Feb 28, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameysharp commented Feb 28, 2023

Uh oh!

alexcrichton commented Feb 28, 2023

Uh oh!

abrown left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

abrown Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 2, 2023

Choose a reason for hiding this comment

Uh oh!

abrown Mar 3, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

alexcrichton commented Mar 6, 2023

Uh oh!

cfallin left a comment

Choose a reason for hiding this comment

Uh oh!

jameysharp left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jameysharp Mar 6, 2023

Choose a reason for hiding this comment

Uh oh!

alexcrichton Mar 7, 2023

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

cfallin commented Feb 27, 2023 •

edited

Loading

cfallin commented Feb 28, 2023 •

edited

Loading