Type-confusion when storing i32 into a stack slot

> Ok I think I got everything working now! Since it was a very recent change, @cfallin can you confirm you saw [95f86be](https://github.com/bytecodealliance/wasmtime/commit/95f86be431557253e41baaef0da68ccf0b3bc0fd)? That was necessary to fix some segfaults that were cropping up on windows since the 4-byte return values were getting stored as 8-byte results. The comment there makes me suspicious though...

That was indeed a suspicious comment. I am not sure exactly what the original intent was but I double-checked just now that this `store` helper is not being used to codegen stores; those directly produce correctly-sized instructions. I suspect this originally came from spill/reload helpers in which case "always spill/reload the full reg" is a reasonable conservative approach but even then we have the invariant that upper bits (beyond a type's width) in a register are undefined and extended when needed by an op so this should be fine I think. Thanks for calling it out!

_Originally posted by @cfallin in https://github.com/bytecodealliance/wasmtime/issues/2806#issuecomment-814437743_

I guess I'm coming after the battle, but this particular change introduced a regression. See https://github.com/bytecodealliance/wasmtime/pull/2277 for context, which added an optimization for the following situation. Imagine we have the following CLIF:

```
v1 = iadd.i32 {...}
v2 = uextend.i64 v1
```

Thus v1 is 32-bits, and v2 is 64-bits. During lowering, we pattern-match this situation: in theory, the `iadd.i32` is compiled to an `addl`, which clears out the upper bits, so the zero-extension is spurious, and can be redefined as a plain copy of the input (a move!). So this might be lowered into the following vcode:

```
%v1 = addl ...
movq %v1, %v2
```

Now, from the point of view of register allocation, both vcode's values are using the `RegClass::i64`, so the two virtual registers can be coalesced and allocated to the same register. That's all right... unless the register is spilled

```
// say %v1 and %v2 are allocated to %r1
%r1 = addl ...
movl %r1, 0x20(%rsp) // spill to a stack slot as an i32, since %v1 is an i32
// ... later
movq 0x20(%rsp), %r1 // reload as an i64, since %v2 is an i64. The high 32-bits are undefined!
```

The line ending with `spill to a stack slot as an i32` results from the change Alex made. Before this, the value would be stored as an i64, so the high bits from the iaddl would be stored in the stack slot and correctly reloaded. The comment's wording didn't quite help in describing this situation. Also, there should have been a test case, so that part is on me.

So the issue is that register allocation kind of loses the precise type, and that a store to a stack slot should store the full width of the stack slot all the time, so there are no undefined bits when storing and reloading values within a given `RegClass` (here, `RegClass::I64`). Alternatively, we might remove the `uextend` elimination optimization, but that would increase register pressure and slow down the decoding pipeline a bit, so I'd rather go with the first solution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Type-confusion when storing i32 into a stack slot #2839

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type-confusion when storing i32 into a stack slot #2839

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions