Skip to content

Use Java 17 intrinstics for long multiplication #1576

@xtonik

Description

@xtonik

Hi folks,

I have noticed, that new version of Java version 17 was adopted by Jackson, which offers several performance improvements:

  1. Long.multiplyHigh(long, long) was intrinsified by JDK-8187964 and should be preferred over custom implementation, it is present in MathUtils now.

  2. Having intrinsified method Long.multiplyHigh(long, long), it could be used for more performant division by multiply and shift (yes, my evergreen as Faster division by 1000 #1203), according to my benchmarks results, long number divided by 10^9 should be 30% faster via multiply and shift, the code:

private static final int MAGIC_SHIFT = 28;
private static final long MAGIC_MULT = BigInteger.ONE.shiftLeft(Long.SIZE + MAGIC_SHIFT).divide(BigInteger.TEN.pow(9)).longValue() + 1; // ~ 2^(64+28) / 10^9
public static long divBy10to9(long n) {
    return Math.multiplyHigh(n, MAGIC_MULT) >>> 28;
}

A example of correctness verification:

long max = Long.MAX_VALUE / 1_000_000_000L + 1;
for (long i = 1; i < max; i++) {
    long n = 1_000_000_000L * i;
    long div1 = divBy10to9(n - 1);
    long div2 = divBy10to9(n);
    if (div1 != i - 1 || div2 != i) {
       throw new IllegalStateException(i + " " + div1 + " " + div2);
    }
}
if (divBy10to9(Long.MAX_VALUE) != Long.MAX_VALUE / 1_000_000_000L) {
      throw new IllegalStateException(Long.MAX_VALUE);
}
  1. Base64 performance could be improved by standard JDK implementation, because:
    a) It is faster than yours implementation according to Inconsistent TextBuffer#getTextBuffer behavior #182.
    b) This job could be done faster by custom Base64, I have achieved much better results with my solution (not published anywhere), even without use Unsafe or VarHandle.
    c) The most performant is JDK implementation now as critical sections (decoding and encoding whole blocks) were intrinsified by JDK-8248188, which is around 4 times faster than my best for longer strings, much more for the others. Yes, I know, Jackson provides multiple variants, which JDK does not, but variants implemented by JDK are the most used ones, so it could be somehow combined together.

See benchmark results for decoding String consisting of 10000 characters - I was a little bit cheating as Java 17, which I have installed, has not evidently intrinsified Base64 critical section, I used Java 21 instead of it hoping it is equivalent for Java 17 having intrinsified Base64 (may be it is really cheating because of JDK-8269404):

  Mode Cnt Score (ops/ms) Error Units
Base64DecodeBenchmark.jackson thrpt 10 41.909 ± 4.624 ops/ms
Base64DecodeBenchmark.jdk thrpt 10 314.970 ± 43.286 ops/ms

I can help with PRs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    3.2performanceIssue related to performance problems or enhancements

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions