Skip to content

8386891: AArch64: Implement getAndAddB*, getAndAddS*, getAndSetB*, getAndSetS*#31640

Open
mikabl-arm wants to merge 1 commit into
openjdk:masterfrom
mikabl-arm:8386891
Open

8386891: AArch64: Implement getAndAddB*, getAndAddS*, getAndSetB*, getAndSetS*#31640
mikabl-arm wants to merge 1 commit into
openjdk:masterfrom
mikabl-arm:8386891

Conversation

@mikabl-arm

@mikabl-arm mikabl-arm commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

AArch64 C2 is missing match rules for byte- and half-word sized variants of GetAndAdd and GetAndSet atomic operations.

Add AArch64 match rules for:

getAndAddB[Acq][NoRes][Const]
getAndAddS[Acq][NoRes][Const]
getAndSetB[Acq]
getAndSetS[Acq]

These should lower to the corresponding AArch64 LSE byte/half-word atomic instructions:

GetAndAddB* -> LDADDB / LDADDALB
GetAndAddS* -> LDADDH / LDADDALH
GetAndSetB* -> SWPB / SWPALB
GetAndSetS* -> SWPH / SWPALH

The patch has passed the following test groups on AArch64:

  • hotspot:hotspot_all(no vmTestbase stress)
  • jdk:tier1,:tier2,:tier3
  • langtools:tier1
  • jcstress with -tb 1d


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8386891: AArch64: Implement getAndAddB*, getAndAddS*, getAndSetB*, getAndSetS* (Enhancement - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31640/head:pull/31640
$ git checkout pull/31640

Update a local copy of the PR:
$ git checkout pull/31640
$ git pull https://git.openjdk.org/jdk.git pull/31640/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31640

View PR using the GUI difftool:
$ git pr show -t 31640

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31640.diff

Using Webrev

Link to Webrev Comment

…tAndSetS*

AArch64 C2 is missing match rules for byte- and half-word sized variants of `GetAndAdd` and `GetAndSet` atomic operations.

Add AArch64 match rules for:

```
getAndAddB[Acq][NoRes][Const]
getAndAddS[Acq][NoRes][Const]
getAndSetB[Acq]
getAndSetS[Acq]
```

These should lower to the corresponding AArch64 LSE byte/half-word atomic instructions:

```
GetAndAddB* -> LDADDB / LDADDALB
GetAndAddS* -> LDADDH / LDADDALH
GetAndSetB* -> SWPB / SWPALB
GetAndSetS* -> SWPH / SWPALH
```

The patch has passed the following test groups on AArch64:

- hotspot:hotspot_all(no vmTestbase stress)
- jdk:tier1,:tier2,:tier3
- langtools:tier1
- jcstress with -tb 1d
@bridgekeeper

bridgekeeper Bot commented Jun 23, 2026

Copy link
Copy Markdown

👋 Welcome back mablakatov! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk

openjdk Bot commented Jun 23, 2026

Copy link
Copy Markdown

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot added the hotspot hotspot-dev@openjdk.org label Jun 23, 2026
@openjdk

openjdk Bot commented Jun 23, 2026

Copy link
Copy Markdown

@mikabl-arm The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk

openjdk Bot commented Jun 23, 2026

Copy link
Copy Markdown

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot. This can be overridden with the /reviewers command.

@openjdk openjdk Bot added the rfr Pull request is ready for review label Jun 23, 2026
@mlbridge

mlbridge Bot commented Jun 23, 2026

Copy link
Copy Markdown

Webrevs

@theRealAph theRealAph left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good and straightforward. I do wonder, though if the non-LSE variants are worth having at all. Also, are the sign-extending variants worth anything? Won't C2 do the sign extension if the signed variants are absent?

@mhaessig

Copy link
Copy Markdown
Contributor

Won't C2 do the sign extension if the signed variants are absent?

No it won't. See 4f56990 for a handful of instances that did not properly sign- or zero-extend results. C2 expects all results of subword types to be properly truncated. See also John's excellent treatise on subword types.

@shipilev shipilev left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable.

I wish we implement https://bugs.openjdk.org/browse/JDK-8372800 at some point, so that we do not emit extra stores when we do not need them. This is outside the scope of this particular PR, it is just it adds more match rules that fall into the similar trap.

// 16 bit integer valid for add sub immediate
operand immSAddSub()
%{
predicate(n->get_int() <= 4095 && n->get_int() >= -4095);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as operand_valid_for_add_sub_immediate? The AddSub suffix for the operand suggests so. I understand we want to cap the range for S, like we do just above for immBAddSubV, but does 4095 come close to failing operand_valid_for_add_sub_immediate? Not sure what is the clean/robust solution here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This predicate is stricter than operand_valid_for_add_sub_immediate. The latter also accepts immediates encodable with the optional 12-bit shift, so it cal allow values outside of the 16-bit range.

Looking at this again, this predicate might actually be too strict: it rejects 16-bit immediates with 12 trailing zero bits, even those would be valid Add/Sub immediate operands.

@mikabl-arm

Copy link
Copy Markdown
Contributor Author

I do wonder, though if the non-LSE variants are worth having at all.

I believe they are, at least as long as the JVM is expected to support Armv8.0 (or earlier). FEAT_LSE was introduced in Armv8.1-A, so any system predating that doesn't support those instructions.

@theRealAph

Copy link
Copy Markdown
Contributor

The problem is that all this macro expansion scales very badly. If you change the ATOMIC_OP macro to a function, then it can be called by all of the operations. And similarly with ATOMIC_XCHG.

typedef void (MacroAssembler::*reg_or_const_op)(Register, Register, RegisterOrConstant);

template<typename AtomicOp, typename ExclusiveLd, typename ExclusiveSt>
void MacroAssembler::atomic_group_op(reg_or_const_op op, reg_or_const_op iop,
                                     AtomicOp aop, enum operand_size sz,
                                     ExclusiveLd loadOp, ExclusiveSt storeOp,
                                     Register prev, RegisterOrConstant incr, Register addr) {
  if (UseLSE) {
    prev = prev->is_valid() ? prev : zr;
    if (incr.is_register()) {
      (this->*aop)(sz, incr.as_register(), prev, addr);
    } else {
      mov(rscratch2, incr.as_constant());
      (this->*aop)(sz, rscratch2, prev, addr);
    }
    return;
  }
  Register result = rscratch2;
  if (prev->is_valid())
    result = different(prev, incr, addr) ? prev : rscratch2;

  Label retry_load;
  prfm(Address(addr), PSTL1STRM);
  bind(retry_load);
  (this->*loadOp)(result, addr);
  (this->*op)(rscratch1, result, incr);
  (this->*storeOp)(rscratch2, rscratch1, addr);
  cbnzw(rscratch2, retry_load);
  if (prev->is_valid() && prev != result) {
    (this->*iop)(rscratch1, result, incr);
  }
}

#define ATOMIC_OP(NAME, LDXR, OP, IOP, AOP, STXR, sz)             \
void MacroAssembler::atomic_##NAME(Register prev, RegisterOrConstant incr, Register addr) { \
  atomic_group_op(&MacroAssembler::OP, &MacroAssembler::IOP, \
                  &MacroAssembler::AOP,                                 \
                  sz, &MacroAssembler::LDXR, &MacroAssembler::STXR,     \
                  prev, incr, addr);                                    \
}

That way, all the callers are just stubs rather than being mega-expanded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

4 participants