I wanted to make a dedicated issue to continue discusison from #10110 (comment)
Currently Wasmtime on x64 switches to using only 32/64-bit ALU ops here, ignoring 8/16 bit types as input and using the wider operation instead. This leads to what I personally find is a confusing pattern which is sometimes the type is used sometimes it isn't. We already have to get everything correct for sunk operands as that's required to operate on the precise width, and I'm not sure what the benefit is to use a 32-bit instruction rather than an 8-bit instruction.
In the linked thread @cfallin mentions:
I don't know about performance implications -- I do recall vaguely something about "partial-register stalls" if one updates only a part of the destination register but maybe it's fine if the instruction already depends on the register as an input. Assuming that's not a concern (@abrown confirm?) then this seems fine with me.
I know this came up with high-latency instructions like sqrt where the problem we ran into was that a false dependency was created between instructions where some instructions operate on the full xmm width and some don't. I'm not sure if this is a problem for (what I assume are) low-latency instructions like and. Additonally I'm not sure if smaller-than-64-bit-width instructions preserve upper bits or sign/zero extend (I couldn't figure it out from the docs)
Otherwise though I would expect that even today where we clamp at 32 bits we still have this problem. That means we're already doing smaller-than-register-width operations which have the theoretical possibility of creating false dependencies.
Basically I view the current state as a bit of a weird inbetween of two worlds we could possibly be in:
- Always use exact-width instructions
- Always use 64-bit width instructions unless memory is related, then be precise.
Personally I feel like we should lean towards (1) under the assumption it doesn't have bad performance.
I wanted to make a dedicated issue to continue discusison from #10110 (comment)
Currently Wasmtime on x64 switches to using only 32/64-bit ALU ops here, ignoring 8/16 bit types as input and using the wider operation instead. This leads to what I personally find is a confusing pattern which is sometimes the type is used sometimes it isn't. We already have to get everything correct for sunk operands as that's required to operate on the precise width, and I'm not sure what the benefit is to use a 32-bit instruction rather than an 8-bit instruction.
In the linked thread @cfallin mentions:
I know this came up with high-latency instructions like sqrt where the problem we ran into was that a false dependency was created between instructions where some instructions operate on the full xmm width and some don't. I'm not sure if this is a problem for (what I assume are) low-latency instructions like
and. Additonally I'm not sure if smaller-than-64-bit-width instructions preserve upper bits or sign/zero extend (I couldn't figure it out from the docs)Otherwise though I would expect that even today where we clamp at 32 bits we still have this problem. That means we're already doing smaller-than-register-width operations which have the theoretical possibility of creating false dependencies.
Basically I view the current state as a bit of a weird inbetween of two worlds we could possibly be in:
Personally I feel like we should lean towards (1) under the assumption it doesn't have bad performance.