Adding _slice metadata to ES|QL#152212
Conversation
|
Pinging @elastic/es-search-relevance (Team:Search Relevance) |
| if (Build.current().isSnapshot()) { | ||
| entries.addAll(snapshotOnlyAttributes); | ||
| } | ||
| if (SliceIndexing.SLICE_FEATURE_FLAG.isEnabled()) { |
There was a problem hiding this comment.
this check is a bit fragile, in case we ever want to add more metadata attributes that depend of feature flags in the near feature - which might not be the case and we'll enable _slice soon enough.
If it was me - I would actually refactor this whole bit.
I wonder if we could either define ATTRIBUTES_MAP in a static block, or remove the arguments from
createMetadataAttributes so we don't pass these buckets of snapshot-only and feature flag enabled metadata attributes.
And then in that static block or createMetadataAttributes
// all the metadata attributes that are always enabled
List<Map.Entry<String, MetadataAttributeConfiguration>> entries = List.of(
Map.entry("_version", new MetadataAttributeConfiguration(DataType.LONG, false)),
...);
// add every metadata attribute depending on whether its Cap is enabled
if (EsqlCapabilities.Cap.METADATA_TIER_FIELD.isEnabled()) {
entries.add(Map.entry(DataTierFieldMapper.NAME, new MetadataAttributeConfiguration(DataType.KEYWORD, true)));
}
if (EsqlCapabilities.Cap.METADATA_SLICE.enabled()) {
entries.add(Map.entry(SliceIndexing.PARAM_NAME, new MetadataAttributeConfiguration(DataType.KEYWORD, true)));
}
The other aspect is that the pattern we usually have when we deal with feature flag enabled features, is to define a capability and use that instead of the feature flag in the esql implementation.
| * (with Lucene pushdown) against indices with {@code index.slice.enabled: true}. | ||
| */ | ||
| @ThreadLeakFilters(filters = TestClustersThreadFilter.class) | ||
| public class MetadataSliceIT extends ESRestTestCase { |
There was a problem hiding this comment.
I think this is good for now - and we can also use this test in the future to get the profile of the query to check how _slice is being pushed down (similar to PushQueriesIT).
But as a follow up we need to add CSV tests with a new dataset that actually uses _slice.
CSV tests allow us to test in multiple scenarios - multi-node, multi-cluster, bwc tests when the versions of nodes/cluster differ etc.
To add a dataset, I'd start in:
x-pack/plugin/esql/qa/testFixtures/src/main/java/org/elasticsearch/xpack/esql/CsvTestsDataLoader.java
there's a withRequiredCapabilities that's going to prove useful, because we don't want to create an index using _slice when the capability (feature flag) for _slice is not enabled.
But that's something we can definitely do in a separate PR.
|
We have some failing tests: I think we just need to add |
ioanatia
left a comment
There was a problem hiding this comment.
going to approve this one.
we don't need the CSV dataset + tests - let's add that separately.
The small refactor in MetadataAttribute and fixing the tests in ExternalMetadataColumnsTests should be easy and don't require another review IMO.
but let me know if there's something I should still be looking into after you made those changes.
Slice-enabled index shards require DiskBBQPlugin.IndexSettingProvider to inject index.slice.validated=true at creation time. Without it the IndexSettings constructor throws during shard init, leaving shards permanently unassigned. Some CI distributions (e.g. the serverless stateful check) do not bundle diskbbq, causing all 10 tests to fail with no_shard_available_action_exception. Add a @before guard that inspects /_nodes/plugins for the diskbbq module and calls assumeTrue to skip the whole class gracefully when it is absent.
This allows for
METADATA _slicefor ESQL. This doesn't provide optimized routing or vector search for _slice yet.Only basic filtering and access to the values.
i admittedly don't know all the tests I should add, but the field aliasing done in search seems to work just fine.