PS-10481: Fix range optimizer full table scan for IN() with oversized values#5871
PS-10481: Fix range optimizer full table scan for IN() with oversized values#5871percona-mhansson wants to merge 1 commit intopercona:8.4from
Conversation
… values The range optimizer did not handle the case of search keys being longer than the index's key length. In this case, a search key could be created, and in the case of WHERE clauses with multiple sufficient conditions - either OR or IN conditions - it had to fall back to table scan. For binary collations, this is fine: an equality condition involving a value that wouldn't have fit the column can never match, and so we can safely just ignore it in a disjuntion expression. For other collations, however, a longer sequence in code points can match a shorter one, and there is no upper limit. Fixed by allocating a larger buffer in case the common key creationg fails due to truncation. We allocate as much space as the worst-case scenario and then some, heeding the advice in comment above `strnxfrmlen()`. For full indexes we just render the entire string, and for prefixes a string we the same amount of characters (really code points) as the index's declared key length. Putting the test case in type_varchar since this is ultimately a trait of the VARCHAR type.
263dd39 to
599df50
Compare
kamil-holubicki
left a comment
There was a problem hiding this comment.
I don't feel competent to approve or reject. Just comments.
| &tree, value, type, field, &impossible_cond_cause, alloc, | ||
| param->query_block, inexact); | ||
|
|
||
| if (always_true_or_false && *inexact && is_string_type(field->type()) && |
There was a problem hiding this comment.
The name of the variable always_true_or_false for the boolean vairable is just bizare :). I understand it comes from the upstream, and if you look at the description of when save_value_and_handle_conversion returns true, it makes sense. But for the analysis on the caller level it looks kind of funny :)
There was a problem hiding this comment.
Yep! Everything has completely nonsensical and misleading names here.
|
|
||
| if (always_true_or_false && *inexact && is_string_type(field->type()) && | ||
| (!is_prefix_index(field->table, key_part->key) || | ||
| type == Item_func::EQ_FUNC)) { |
There was a problem hiding this comment.
Is OR condition fine here? (looking at the comment below)
There was a problem hiding this comment.
I know, it's a pain to have to right the conditional like this. So the restriction is that for prefix indexes, we can only handle equality. So we can have either that it's not a prefix index - and then we allow any conditional - or else (if it is a prefix index) the condition has to be equality. So we end up with this backward-looking "not prefix or else equality".
The range optimizer did not handle the case of search keys being longer than the index's key length. In this case, a search key could be created, and in the case of WHERE clauses with multiple sufficient conditions - either OR or IN conditions - it had to fall back to table scan.
For binary collations, this is fine: an equality condition involving a value that wouldn't have fit the column can never match, and so we can safely just ignore it in a disjuntion expression. For other collations, however, a longer sequence in code points can match a shorter one, and there is no upper limit.
Fixed by allocating a larger buffer in case the common key creationg fails due to truncation. We allocate as much space as the worst-case scenario and then some, heeding the advice in comment above
strnxfrmlen(). For full indexes we just render the entire string, and for prefixes a string we the same amount of characters (really code points) as the index's declared key length.Putting the test case in type_varchar since this is ultimately a trait of the VARCHAR type.