aarch64 add basic i128 bit ops#2959
Conversation
| v5, v6 = isplit v4 | ||
| return v5, v6 | ||
| } | ||
| ; run: %ishl_i128_i8(0x01010101_01010101, 0x01010101_01010101, 2) == [0x04040404_04040404, 0x04040404_04040404] |
There was a problem hiding this comment.
@cfallin I pretty much copied the testing code from x64/shift-i128-run.clif. And added a few more test cases where my implementation had issues.
Do you think it would be a good idea to merge run tests so that they could be multi arch? I've been thinking about something along the lines of:
testfiles/runtests/i128-bitops.clif:
test run
target x86_64
target aarch64
target s390x
function %rest_of_the_tests() {}
I think we should be able to do this for all runtests? This way we could reuse all of these test cases for all arches.
The down side of this is that we may have to be more granular with the testfiles. For example this file right here fails for x86_64 because some of the bit ops fail. So we would have to split this into shift_run and bitops_run or something along those lines.
What do you think?
There was a problem hiding this comment.
Yes, sharing run-tests makes a lot of sense! In general I actually want to try to shift tests from golden-code (compile tests) to golden-output (run tests) as this makes our suite more robust against cross-cutting backend changes, such as regalloc optimizations; so there are additional benefits, aside from sharing across architectures.
Could you be more specific about the failures you're seeing on x86_64, though? That's somewhat concerning -- unimplemented opcode, incorrect result, or something else?
There was a problem hiding this comment.
Great! Ill try to make a PR changing this soon.
About x86_64:
BandNot is failling on this assert.
BorNot / BxorNot / Cls (not on this PR yet) are not implemented at all.
There was a problem hiding this comment.
Ah, OK, for tests like those for BandNot where we don't have an implementation on x86_64, we can just omit the target line in the test file, perhaps with a comment saying "not yet implemented on $PLATFORM".
98295ae to
c99c02c
Compare
c99c02c to
67ab750
Compare
67ab750 to
1eb4259
Compare
|
Yeah, sure, merge that first, and then I'll update this |
Currently we just basically use a two instruction version of the same i64 ops. IMMLogic doesn't really support multiple register inputs, so its left as a TODO for future optimizations.
8383d71 to
988ddfd
Compare
988ddfd to
b1475f3
Compare
Hey, some more ops implemented for i128 support.
I think the shifts can be reduced by a few instructions (especially the last csel), but I'm not seeing how right now.
I also didn't implement support for immlogic, changing it to support multiple registers didn't seem like a simple change.
This adds support for: