Wasmtime recently had #4315 filed against it which discovered that there were two separate bugs in the SIMD implementation on x86_64. This discovery comes after "months of continuous oss-fuzzing" for the simd feature. I wanted to file an issue here with some investigation of why this happened because this theoretically should not happen.
Specifically here the bug was a buggy instruction lowering (two different ones). One fix (#4318) surfaced by corrupting an input register which I think only causes issues if the input is attempted to be reused elsewhere (e.g. a constant reused somewhere else). I don't know precisely but my impression was that this involved some register pressure, a "big" function, and constants to line up. This specific bug I could see as very difficult to discover via wasm-smith. The second bug, however, (#4317) was a trivial bug in the select instruction which showed up with the smallest of tests for select. The fact that wasm-smith never discovered this is alarming to me.
Digging in it appears to be a confluence of factors which makes wasm-smith basically unable to find these bugs:
- The
select instruction requires 3 operands on the stack of specific types. Turns out this very rarely happens. I inserted a panic! whenever a select instruction was even considered a candidate, and it was rarely hit. Even less rarely is the instruction chosen to be emitted.
- The
i32 input to select I think is almost always nonzero at runtime itself. The specific bug only happened when the condition was 0, however. I think this is because a lot of i32s come from things like i32.const which is practically never zero.
- Even if
select is generated with v128 inputs (which happens quite rarely) it's often never actually even executed at runtime. The few test cases I found which generated this instruction immediately had infinite recursion or an infinite loop with the interesting instructions far away.
I unfortunately don't know if there's really a "fix" for issues like this. We could throw a bunch more heuristics at wasm-smith but at some point we probably need a somewhat fundamental new strategy for fuzzing here to get significantly more coverage.
Wasmtime recently had #4315 filed against it which discovered that there were two separate bugs in the SIMD implementation on x86_64. This discovery comes after "months of continuous oss-fuzzing" for the simd feature. I wanted to file an issue here with some investigation of why this happened because this theoretically should not happen.
Specifically here the bug was a buggy instruction lowering (two different ones). One fix (#4318) surfaced by corrupting an input register which I think only causes issues if the input is attempted to be reused elsewhere (e.g. a constant reused somewhere else). I don't know precisely but my impression was that this involved some register pressure, a "big" function, and constants to line up. This specific bug I could see as very difficult to discover via wasm-smith. The second bug, however, (#4317) was a trivial bug in the
selectinstruction which showed up with the smallest of tests forselect. The fact that wasm-smith never discovered this is alarming to me.Digging in it appears to be a confluence of factors which makes wasm-smith basically unable to find these bugs:
selectinstruction requires 3 operands on the stack of specific types. Turns out this very rarely happens. I inserted apanic!whenever aselectinstruction was even considered a candidate, and it was rarely hit. Even less rarely is the instruction chosen to be emitted.i32input toselectI think is almost always nonzero at runtime itself. The specific bug only happened when the condition was 0, however. I think this is because a lot of i32s come from things likei32.constwhich is practically never zero.selectis generated with v128 inputs (which happens quite rarely) it's often never actually even executed at runtime. The few test cases I found which generated this instruction immediately had infinite recursion or an infinite loop with the interesting instructions far away.I unfortunately don't know if there's really a "fix" for issues like this. We could throw a bunch more heuristics at wasm-smith but at some point we probably need a somewhat fundamental new strategy for fuzzing here to get significantly more coverage.