Skip to content

Draft: I128 support (partial) on x64.#2504

Closed
cfallin wants to merge 7 commits into
bytecodealliance:mainfrom
cfallin:multi-reg-result-2
Closed

Draft: I128 support (partial) on x64.#2504
cfallin wants to merge 7 commits into
bytecodealliance:mainfrom
cfallin:multi-reg-result-2

Conversation

@cfallin
Copy link
Copy Markdown
Member

@cfallin cfallin commented Dec 13, 2020

This PR generalizes all of the MachInst framework to reason about SSA Values as being located in multiple registers (one, two or four, currently, in an efficient packed form). This is necessary in order to handle CLIF with values wider than the machine width (I128 on 64/32-bit machines and I64 on 32-bit machines), unless we legalize it beforehand.

It also adds support for some basic 128-bit ALU ops to the x64 backend, loewring these directly to open-coded instruction sequences (add/adc, sub/sbb, etc.).

@julian-seward1 and @bnjbvr: this is the approach we had discussed a long time ago (in January, I think!). It follows from the "every backend accepts the same IR" philosophy, with the idea that maybe we will get to the point where legalization is largely not necessary.

However: I must say that I'm not super-happy with the level of complexity this has added to the framework. The fact that the work we do in the x64 backend to support this will have to be repeated on aarch64 is kind of unfortunate; and this all feels somewhat silly given that we still have the legalization framework's narrowing support, and could use that instead. Philosophically, I think that legalization is actually the right approach here: we should be able to factor out "general machine-independent algorithm for 128-bit multiply with 64-bit pieces" from the specific machine backends.

So I'm inclined not to go in this direction, but (i) want to see what thoughts anyone might have, and (ii) save this for the record, in case we wire up the old legalizations for now but reconsider or need multi-reg values in the future.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 13, 2020

For context, I should also note: this (I128 support) is one of the few pieces still missing from the new x64 backend. The Wasm frontend does not use it, but rustc_codegen_clif does (cc @bjorn3). I expect we would need a few more ops before we can try this with cg_clif, though.

Copy link
Copy Markdown
Contributor

@bjorn3 bjorn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❤️ ❤️ ❤️

To avoid duplication of the lowering logic between backends maybe it would be possible to have a kind of template clif ir that can be instanced during lowering. Such a template would use specific Values for inputs and outputs. The rest are temporary variables. This could be faster than legalization as it doesn't require modifying the input clif ir. It would also allow backends to chose between multiple templates when there are different ways to lower an instruction with different tradeoffs.


if op != Opcode::Imul {
// add, sub, and, or, xor: just do ops on lower then upper half. Carry-flag
// propagation is implicit (add/adc, sub/sbb).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can regalloc insert an instruction in between?

Copy link
Copy Markdown
Contributor

@julian-seward1 julian-seward1 Dec 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regalloc is specifically disallowed from inserting any instructions that change the condition codes, for exactly this reason. This is specified in comments in the regalloc.rs interface.

@github-actions github-actions Bot added cranelift Issues related to the Cranelift code generator cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:x64 Issues related to x64 codegen labels Dec 13, 2020
@julian-seward1
Copy link
Copy Markdown
Contributor

@cfallin As a general comment, I see the dilemma. I haven't yet read the patch, but wanted to ask: part of the motivation for going this route was that we would then be able to remove the complexity, the semantic murkyness, and the potential for generating poor code, caused by having to explicitly represent the carry flag in CLIF. If we revert to expanding out doubleword arithmetic at the CLIF level, is there some other way we can avoid those disadvantages?

Also .. does it matter that, doing this per-target (per this patch) potentially makes it possible to use target-specific sequences, which wouldn't be possible if it were done at the CLIF level?

RegMemImm::reg(regs::rdx()),
dst.regs()[1],
));
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cfallin Is it really so bad that we have to duplicate these circa 50 lines per-target, to do mul.i128? It would be even shorter if rustfmt didn't insist on laying out the calls in this space-inefficient way.

There will be roughly equivalent length sequences for 128-bit left/right shifts, and for 128-bit comparisons. For 128-bit division, we'll have to call a helper on all targets. On 32-bit targets, we'll probably have to use a helper for multiplies, shifts and comparisons too.

@julian-seward1
Copy link
Copy Markdown
Contributor

doing this per-target (per this patch) potentially makes it possible to use target-specific sequences

For example, doubleword comparison on x64 is a bit elaborate, but for aarch64 it's just cmp ; ccmp.

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 13, 2020

I just tested this with cg_clif. As far as I can see this works fine, but there are still missing pieces both related and unrelated to 128bit int support which mean that I can't completely test this yet.

support necessary in any case:

support necessary to not require modifications of cg_clif:

  • load/store of i128
  • rotl.i128/rotr.i128
  • bitrev.i8
  • brz.i128/brnz.i128

@julian-seward1
Copy link
Copy Markdown
Contributor

One thing that does concern me is struct ValueRegs, in particular the potential inefficiency induced by it. It's obviously fully general, in the sense that it can hold 4 unrelated Reg values, but it does require passing around a 128-bit thing where before we were just passing around a 32-bit thing (Reg). And, given that it requires indexing, I reckon it ain't gonna wind up in a vector register, hence will always be in memory.

I had wondered if such generality is really necessary. Given that -- IIUC -- all the result virtual register fields are allocated before isel starts, would it be adequate to have an representation which specifies a base virtual register, plus zero, one or two subsequently-numbered vregs of the same class? Then that would fit in 34 bits (hence, a uint64_t) and could be passed around in a (host) register just as Reg is now.

Even if that won't work, does ValueRegs really need to be template-typed? Are there any other types of things we need to store in there?

}

/// Return the single register used for this value, if any.
pub fn only_reg(self) -> Option<R> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this be a Result with something like struct OnlyRegError; as error type? That would give a better panic message when unwrapping.

Copy link
Copy Markdown
Contributor

@bjorn3 bjorn3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling functions with 128bit arguments isn't implemented.

Comment thread cranelift/codegen/src/isa/x64/lower.rs Outdated
Comment thread cranelift/codegen/src/isa/x64/lower.rs Outdated
Comment thread cranelift/codegen/src/isa/x64/lower.rs Outdated
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 13, 2020

So I'm starting to warm up to this approach relative to falling back on existing legalizations; it's really not too much more work to get to the end-point and the legalization is pretty hairy too. I'm working on the additional ops now; hopefully ready soon!

To a few design-question points:

One thing that does concern me is struct ValueRegs, in particular the potential inefficiency induced by it. It's obviously fully general, in the sense that it can hold 4 unrelated Reg values, but it does require passing around a 128-bit thing where before we were just passing around a 32-bit thing (Reg). And, given that it requires indexing, I reckon it ain't gonna wind up in a vector register, hence will always be in memory.

I played with this a bit with Compiler Explorer (link) -- it seems it's passed by reference as an arg, but returned in two regs (rdx:rax). So, yeah, not as efficient as I had hoped, but at least it's not malloc'ing... though see below for alternative idea.

I had wondered if such generality is really necessary. Given that -- IIUC -- all the result virtual register fields are allocated before isel starts, would it be adequate to have an representation which specifies a base virtual register, plus zero, one or two subsequently-numbered vregs of the same class? Then that would fit in 34 bits (hence, a uint64_t) and could be passed around in a (host) register just as Reg is now.

Tempting idea! This doesn't handle the real-register case though (needed at least for ABI specs). I suppose we could special-case that but then we bifurcate some other APIs (gen load/spill, move, ...).

I'm thinking that we might actually be able to optimize in another (much simpler) way -- on 64-bit machines and with our largest types being 128-bit, we can get away with the 1 / 2-reg cases. Perhaps we could define a static constant for size and, with some config magic, select 2 unless arm32 (or later, x86-32, etc.) are compiled in. Then we're just passing around 64 bits, which with padding is probably free relative to our current 32-bit size in most cases.

Even if that won't work, does ValueRegs really need to be template-typed? Are there any other types of things we need to store in there?

Yes, I did this to try to keep to the spirit of Reg, WritableReg, VirtualReg, RealReg. At least for the former two, I tried at first doing Writable<ValueRegs> instead, but Writable is defined in regalloc and requires implementation of a private trait, i.e., it's closed to new uses. IMHO it's somewhat nice for regs() to return (an array of) the correctly-typed thing in many cases...

@julian-seward1
Copy link
Copy Markdown
Contributor

@cfallin Great work, btw! I should have said that earlier.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 14, 2020

Just added implementations for (I think) all of the relevant i128 ops. Will make a cleanup pass and address some of the comments above later. @bjorn3, I didn't get to the TLS support or StructArg support yet, but this might be close enough to hack something together? If not, I'll return to this in a bit :-)

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 14, 2020

@bjorn3 I just added what I think might be a sufficient implementation of ELF TLS support (basically just copying your recipe in the old x86 backend), and also permitted StructReturn (edit: and StructArgument)-purpose args to compile. I'm not sure if any more needs to be done for the latter (does cg_clif rely on some sort of legalization that lowers into a struct-arg/return pointer?).

Let me know if this is sufficient to get cg_clif working! If so, I'll clean this up a bit and split out some pieces to actually land it.

| &ir::ArgumentPurpose::CalleeTLS
| &ir::ArgumentPurpose::StructReturn => {}
| &ir::ArgumentPurpose::StructReturn
| &ir::ArgumentPurpose::StructArgument(_) => {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StructArgument indicates that the argument is a pointer to a piece of memory with the given size that needs to be passed as on the stack in the arguments area. On the caller side you will need to memcpy it to the right place, on the callee side you need to get the address.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, yes, this is making more sense now. Do you think you would be up for attempting an implementation? (My bandwidth is somewhat stretched thin at the moment, but I can come back to this at ... some point, eventually, if needed. Incidentally this is similar to what I think we also need for Windows fastcall, which is one of the other remaining new-backend TODOs...).

| &ir::ArgumentPurpose::CallerTLS
| &ir::ArgumentPurpose::CalleeTLS
| &ir::ArgumentPurpose::StructReturn => {}
| &ir::ArgumentPurpose::StructReturn
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The old backend didn't handle this either, but the system-v abi defines a specific register for this argument. In addition kn x86_64 at least I think you also need to regurn this value in a different register. (I guess to reduce the need for saving it on the caller side.)

use regalloc::{RealReg, Reg, VirtualReg, Writable};
use std::fmt::Debug;

#[cfg(feature = "arm32")]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be preferable here to key it on the word size of the target that the resulting CL will be compiling for? Is that even possible, given the limitations of the Rust config etc system?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such an option does exist, but the wrench in the works is cross-compilation -- e.g. clif-utils is normally built with all targets enabled (so we need to support arm32 compilation even on an x64 host, etc).

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Ah, sorry, actually I misread your comment -- pointer size of target, not host; I don't think we have a config option for that as we have custom features for each backend.)

Comment thread cranelift/codegen/src/isa/x64/inst/emit.rs
@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 14, 2020

Let me know if this is sufficient to get cg_clif working!

Almost. The standard library compiles with the above TLS fix, but std_example fails at several points:

  • saturating_sub::<i128>() (select.i128?)
  • u128 -> float cast (incorrect return value)
  • __m128i -> [u8; 16] transmute (raw_bitcast between i64x2 and i128?) (edit: it fixed itself somehow)

@bjorn3

This comment has been minimized.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 28, 2020

@bjorn3 I just pushed a number of fixes/updates; this I think addresses everything above except for the StructReturn fix and whatever is causing the u128 to/from float conversion error. (I'm testing with a local rustc_codegen_cranelift checkout; I see the same error that you're presumably seeing on std_example.rs line 79, converting u128->f32).

I'm not sure if the latter has to do with the former; it seems that the u128->f32 conversion is a call into libgcc so I suspect it's an ABI issue rather than an i128 arithmetic issue. (Thank goodness for that; I started to work out the open-coded sequence for each of the u128 to/from f32/f64 cases, signed/unsigned, and quickly ran away...)

I won't have more time to look at this for a bit (I'm mostly away until after the new year) but if you have any further debugging insights or want to try to implement the StructReturn fix (I think we just need to preprend a StructReturn return value if an arg exists, and wire the value through) please feel free!

(Also it mostly goes without saying but since this PR has grown quite a bit beyond just i128 arithmetic, I plan to split it up into proper pieces for separate review once we confirm that cg_clif works.)

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 28, 2020

I'm not sure if the latter has to do with the former; it seems that the u128->f32 conversion is a call into libgcc so I suspect it's an ABI issue rather than an i128 arithmetic issue.

On my system it calls into rust's compiler-builtins.

or want to try to implement the StructReturn fix (I think we just need to preprend a StructReturn return value if an arg exists, and wire the value through)

That and use the correct register. It isn't very important though as the old backend didn't implement it either.

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 28, 2020

Minimal repro:

fn convert(i: u128) -> f32 {
    if i == 0 {
        // this branch is taken
        return 0.0;
    }

    extract_sign(i); // except when this call is removed

    100.0
}

fn main() {
    assert_eq!(convert(100u128), 100.0);
}

fn extract_sign(i: u128) {}

Assembly of convert:

00000000000f4af5 <_ZN11std_example7convert17ha5c22c0f04088fadE>:
   f4af5:       55                      push   %rbp
   f4af6:       48 89 e5                mov    %rsp,%rbp
   f4af9:       48 83 ff 00             cmp    $0x0,%rdi
   f4afd:       40 0f 94 c0             sete   %al
   f4b01:       48 83 fe 00             cmp    $0x0,%rsi
   f4b05:       40 0f 94 c1             sete   %cl
   f4b09:       48 21 c1                and    %rax,%rcx
   f4b0c:       0f 84 08 00 00 00       je     f4b1a <_ZN11std_example7convert17ha5c22c0f04088fadE+0x25>
   f4b12:       0f 57 c0                xorps  %xmm0,%xmm0
   f4b15:       48 89 ec                mov    %rbp,%rsp
   f4b18:       5d                      pop    %rbp
   f4b19:       c3                      retq   
   f4b1a:       e8 a1 01 00 00          callq  f4cc0 <_ZN11std_example12extract_sign17h8b332672bd416c0aE>
   f4b1f:       be 00 00 c8 42          mov    $0x42c80000,%esi
   f4b24:       66 0f 6e c6             movd   %esi,%xmm0
   f4b28:       48 89 ec                mov    %rbp,%rsp
   f4b2b:       5d                      pop    %rbp
   f4b2c:       c3                      retq

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 28, 2020

@bjorn3 thanks for that repro -- the disassembly made it pretty clear what was wrong: brz/brnz on an i128 used an incorrect width of and/or instruction. Pushed a fix just now.

Locally, ./test.sh in rustc_codegen_cranelift using this branch now seems to pass all the tests; it dies later during benchmarking but I'm not sure what's wrong:

Benchmark #2: ../build/cargo.sh build
Error: Command terminated with non-zero exit code. Use the '-i'/'--ignore-failure' option if you want to ignore this. Alternatively, use the '--show-output' option to debug what went wrong.

(Also, in case it's relevant, I set JIT_SUPPORTED=0 in scripts/config.sh as that seemed to get me past an issue related to the TLS relocation not being handled in cranelift-jit. Known issue?)

In any case, I'm happy that this seems to allow cg_clif to pass the tests in examples/ :-)

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 28, 2020

Locally, ./test.sh in rustc_codegen_cranelift using this branch now seems to pass all the tests

🎉

it dies later during benchmarking but I'm not sure what's wrong:

Can you add --show-output to the hyperfine invocation in scripts/tests.sh? I can't test it myself until tomorrow. You could also cd to simple-raytracer and then run ../build/cargo.sh build directly to avoid having to build the sysroot again.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 29, 2020

I tried building simple-raytracer, and the cg_clif invocation to compile the tiff crate is dying with a segfault:

   Compiling tiff v0.2.1
error: could not compile `tiff`

Caused by:
  process didn't exit successfully: `/home/cfallin/work/rustc_codegen_cranelift/build/bin/cg_clif --crate-name tiff [...]` (signal: 11, SIGSEGV: invalid memory reference)

I tracked this back to a nonsense pointer passed into slice::copy_from_slice and tracked it backward with rr for a while; but couldn't reach the origin (it's a somewhat manual process tracing through disassemblies and setting memory watchpoints). Valgrind shows that the pointer itself is undefined data. So we're missing some sort of dataflow linkage (a store? ABI semantics related to wide values?) but I've hit my quota for today. Curious to see if you can discover more!

(I implemented StructReturn slightly more properly but this didn't seem to be the problem; wasn't likely, as you noted, but wanted to be sure.)

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 29, 2020

This is a proc macro that has an ABI incompatibility with the cg_llvm comliled rustc due to StructArgument not yet being implemented.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 29, 2020

@bjorn3 I went ahead and implemented StructArgument as well. With that done, simple-raytracer builds, and its binary runs to completion. (The following is the first result computed via rustc -> cg_clif -> x64-new-backend, aside from compilations themselves: result.png!)

There do seem to be a few last test failures later in the test.sh run (in the stdlib I guess?): 1139 passed; 16 failed, mostly some i8/i16 ops and some FP string formatting issues. The former doesn't surprise me too much as narrow-value support is still somewhat young relative to the 32/64-bit paths that are well-hammered by Wasm. In any case, if you (or anyone else) feel like taking a look at those last issues, please do feel free!

I'll clean up this mess of a draft branch when I'm actually "working" again on Jan 4 :-)

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 29, 2020

Last two fixes pushed: DIV on 8-bit values places remainders in AH, not DL (the weird and wonderful depths of x86 never cease to amuse!), and our select impl wasn't recognizing that booleans are integers and need integer moves.

All stdlib tests pass now! I really will close my laptop now and read some books until I clean this up in the new year :-)

@bjorn3
Copy link
Copy Markdown
Contributor

bjorn3 commented Dec 29, 2020

Yay! I hope you will have a great New Year!

@julian-seward1
Copy link
Copy Markdown
Contributor

julian-seward1 commented Dec 30, 2020

@cfallin

DIV on 8-bit values places remainders in AH, not DL

That's surely a case of the general redundant-REX (0x40) rule, or in this case the lack of 0x40? Does the remainder go in DL if there's an 0x40 prefix?

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Dec 30, 2020

@cfallin

DIV on 8-bit values places remainders in AH, not DL

That's surely a case of the general redundant-REX (0x40) rule, or in this case the lack of 0x40? Does the remainder go in DL if there's an 0x40 prefix?

Sadly no, a REX prefix doesn't change anything -- it's just the way the 8-bit variant is defined (reference). A likely remnant of the 8086 days when AL and AH were separately-useful registers for byte-sized values...

(Edit: here's another reference showing more -- a REX byte will change whether the divisor arg refers to low byte of one of the 16 x86-64 registers, or a low/high byte of one of the 8 original registers; but remainder always goes into AH.)

@cfallin cfallin force-pushed the multi-reg-result-2 branch from 899b180 to fdb8b9f Compare January 4, 2021 05:43
… regs.

This will allow for support for `I128` values everywhere, and `I64`
values on 32-bit targets (e.g., ARM32 and x86-32). It does not alter the
machine backends to build such support; it just adds the framework for
the MachInst backends to *reason* about a `Value` residing in more than
one register.
This implements all of the ops on I128 that are implemented by the
legacy x86 backend, and includes all that are required by at least one
major use-case (cg_clif rustc backend).

The sequences are open-coded where necessary; for e.g. the bit
operations, this can be somewhat complex, but these sequences have been
tested carefully. This PR also includes a drive-by fix of clz/ctz for 8-
and 16-bit cases where they were incorrect previously.
This follows the implementation in the legacy x86 backend, including
hardcoded sequence that is compatible with what the linker expects. We
could potentially do better here, but it is likely not necessary.

Thanks to @bjorn3 for a bugfix to an earlier version of this.
The StructReturn ABI is fairly simple at the codegen/isel level: we only
need to take care to return the sret pointer as one of the return values
if that wasn't specified in the initial function signature.

Struct arguments are a little more complex. A struct argument is stored
as a chunk of memory in the stack-args space. However, the CLIF
semantics are slightly special: on the caller side, the parameter passed
in is a pointer to an arbitrary memory block, and we must memcpy this
data to the on-stack struct-argument; and on the callee side, we provide
a pointer to the passed-in struct-argument as the CLIF block param
value.

This is necessary to support various ABIs other than Wasm, such as that
of Rust (with the cg_clif codegen backend).
When using an 8-bit register in 64-bit mode on x86-64, the REX prefix
semantics are somewhat subtle: without the REX prefix, register numbers
4--7 correspond to the second-to-lowest byte of the first four registers
(AH, CH, BH, DH), whereas with the REX prefix, these register numbers
correspond to the usual encoding (SPL, BPL, SIL, DIL). We could always
emit a REX byte for instructions with 8-bit cases (this is harmless even
if unneeded), but this would unnecessarily inflate code size; instead,
the usual approach is to emit it only for these registers.

This logic was present in some cases but missing for some other
instructions: divide, not, negate, shifts.

Fixes bytecodealliance#2508.
The implementations of several FP ops, such as fabs/fneg, used SSE
instructions. This is not a problem per-se, except that load-op merging
did not take *alignment* into account. Specifically, if an op on an f64
loaded from memory happened to merge that load, and the instruction into
which it was merged was an SSE instruction, then the SSE instruction
imposes stricter (128-bit) alignment requirements than the load.f64 did.

This PR simply forces any instruction lowerings that could use SSE
instructions to implement non-SIMD operations to take inputs in
registers only, and avoid load-op merging.

Fixes bytecodealliance#2507.
@cfallin cfallin force-pushed the multi-reg-result-2 branch from fdb8b9f to 21b16cf Compare January 4, 2021 05:58
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Jan 4, 2021

Did some rebase/squash surgery on this branch -- I will split it into I think 8 PRs:

Several of the above are dependent on the multi-reg refactor but the rest should be reviewable in parallel.

- urem/srem.i8: the 8-bit form of the DIV instruction on x86-64 places
  the remainder in AH, not RDX, different from all the other width-forms
  of this instruction.

- select.b1: we were not recognizing selects of boolean values as
  integer-typed operations, so we were generating XMM moves instead (!).

With these two changes on the i128 ops / etc branch, rustc_codegen_clif
passes all tests!
@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Jan 4, 2021

The pieces were a bit more interrelated than I had hoped so this became four PRs: #2538 (framework), #2539 (i128 ops), #2540 (TLS), #2541 (struct args). 2539 depends on 2538; and 2540/2541 depend on 2538 and 2539.

@cfallin
Copy link
Copy Markdown
Member Author

cfallin commented Jan 5, 2021

Closing this draft PR now that the real PRs resulting from it are under review.

@cfallin cfallin closed this Jan 5, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cranelift:area:aarch64 Issues related to AArch64 backend. cranelift:area:machinst Issues related to instruction selection and the new MachInst backend. cranelift:area:x64 Issues related to x64 codegen cranelift Issues related to the Cranelift code generator

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants