Skip to content

Performance regression in terms aggregation with match_all queries #20626

@rishabhmaurya

Description

@rishabhmaurya

Describe the bug

Terms aggregations with match_all queries on high-cardinality fields have experienced significant performance regression since PR #11643 introduced the tryCollectFromTermFrequencies optimization in GlobalOrdinalsStringTermsAggregator. This went unnoticed until recent ClickBench query performance testing revealed severe slowdowns.

Related component

Search:Performance

To Reproduce

Affected Query Pattern

{
  "size": 0,
  "timeout": "1m",
  "aggregations": {
    "URL": {
      "terms": {
        "field": "URL",
        "size": 10
      }
    }
  }
}

Where the field has high cardinality (e.g., 10 million unique values).

Root Cause

The tryCollectFromTermFrequencies method introduced in #11643 uses a leap frogging algorithm that iterates through all global ordinals to match them with segment terms:

Location: GlobalOrdinalsStringTermsAggregator.java:

TermsEnum indexTermsEnum = segmentTerms.iterator();
BytesRef indexTerm = indexTermsEnum.next();
final SortedSetDocValues globalOrds = this.getGlobalOrds(ctx);
TermsEnum globalOrdinalTermsEnum = globalOrds.termsEnum();
BytesRef ordinalTerm = globalOrdinalTermsEnum.next();

while (indexTerm != null && ordinalTerm != null) {
    int compare = indexTerm.compareTo(ordinalTerm);
    if (compare == 0) {
        if (acceptedGlobalOrdinals.test(globalOrdinalTermsEnum.ord())) {
            ordCountConsumer.accept(globalOrdinalTermsEnum.ord(), indexTermsEnum.docFreq());
        }
        indexTerm = indexTermsEnum.next();
        ordinalTerm = globalOrdinalTermsEnum.next();
    } else if (compare < 0) {
        indexTerm = indexTermsEnum.next();
    } else {
        ordinalTerm = globalOrdinalTermsEnum.next();  // Iterates through millions of non-matching ordinals
    }
}

Performance Impact

For an index with:

  • 10 million unique terms globally
  • 10 segments
  • Each segment containing ~2 million unique terms

Current behavior

  • Iterates through 10M global ordinals per segment
  • Total: 10 segments × 10M iterations = 100M+ global ordinal iterations
  • Each iteration involves BytesRef fetching and comparison

The compare > 0 branch skips millions of global ordinals that don't exist in the current segment

Expected behavior

Should only iterate through segment terms (~2M per segment)

Total: 10 segments × 2M lookups = 20M operations

Why This Happens

The leap frogging approach assumes both iterators need to be advanced together, but this is inefficient when:
Global ordinals contain terms from all segments (superset) - (which is always the case? even with deleted docs)

Solution

Replace the leap frogging with direct term lookup?

TermsEnum indexTermsEnum = segmentTerms.iterator();
final SortedSetDocValues globalOrds = this.getGlobalOrds(ctx);
BytesRef indexTerm;

while ((indexTerm = indexTermsEnum.next()) != null) {
    long globalOrd = globalOrds.lookupTerm(indexTerm);
    if (globalOrd >= 0 && acceptedGlobalOrdinals.test(globalOrd)) {
        ordCountConsumer.accept(globalOrd, indexTermsEnum.docFreq());
    }
}
  • Only iterates through segment terms (not all global ordinals)
  • Eliminates unnecessary BytesRef comparisons
  • Scales better with high cardinality fields

Additionally, Add Cardinality Threshold: which @bowenlan-amzn is addressing as part of #20623

Thanks @rishabh6788 for sharing cpu profile and bringing this issue to attention.

Additional Details

Plugins
Please list all plugins currently enabled.

Screenshots
Image
streaming-profile-q34 (1).html

Host/Environment (please complete the following information):

  • OS: [e.g. iOS]
  • Version OS 2.13+

Additional context

Metadata

Metadata

Assignees

Type

No type

Projects

Status

🆕 New

Status

Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions