x64: Wide operations used for some ALU ops instead of precisely-sized ones

I wanted to make a dedicated issue to continue discusison from https://github.com/bytecodealliance/wasmtime/pull/10110#discussion_r1937877372

Currently Wasmtime on x64 switches to using only 32/64-bit ALU ops [here](https://github.com/bytecodealliance/wasmtime/blob/a0338af84f66cb452fdf4b692d4facb5d052c12d/cranelift/codegen/src/isa/x64/inst.isle#L2061), ignoring 8/16 bit types as input and using the wider operation instead. This leads to what I personally find is a confusing pattern which is [sometimes the type is used sometimes it isn't](https://github.com/bytecodealliance/wasmtime/blob/76654321ecab4725e9625ee9c4ec5c535887d224/cranelift/codegen/src/isa/x64/inst.isle#L2955-L2956). We already have to get everything correct for sunk operands as that's required to operate on the precise width, and I'm not sure what the benefit is to use a 32-bit instruction rather than an 8-bit instruction.

In the linked thread @cfallin mentions:

> I don't know about performance implications -- I do recall vaguely something about "partial-register stalls" if one updates only a part of the destination register but maybe it's fine if the instruction already depends on the register as an input. Assuming that's not a concern (@abrown confirm?) then this seems fine with me.

I know this came up with high-latency instructions like sqrt where the problem we ran into was that a false dependency was created between instructions where some instructions operate on the full xmm width and some don't. I'm not sure if this is a problem for (what I assume are) low-latency instructions like `and`. Additonally I'm not sure if smaller-than-64-bit-width instructions preserve upper bits or sign/zero extend (I couldn't figure it out from the docs)

Otherwise though I would expect that even today where we clamp at 32 bits we still have this problem. That means we're already doing smaller-than-register-width operations which have the theoretical possibility of creating false dependencies.

Basically I view the current state as a bit of a weird inbetween of two worlds we could possibly be in:

1. Always use exact-width instructions
2. Always use 64-bit width instructions unless memory is related, then be precise.

Personally I feel like we should lean towards (1) under the assumption it doesn't have bad performance.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

x64: Wide operations used for some ALU ops instead of precisely-sized ones #10199

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

x64: Wide operations used for some ALU ops instead of precisely-sized ones #10199

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions