Skip to content

PS-10481: Fix range optimizer full table scan for IN() with oversized values#5871

Open
percona-mhansson wants to merge 1 commit intopercona:8.4from
percona-mhansson:ps-10481
Open

PS-10481: Fix range optimizer full table scan for IN() with oversized values#5871
percona-mhansson wants to merge 1 commit intopercona:8.4from
percona-mhansson:ps-10481

Conversation

@percona-mhansson
Copy link
Copy Markdown
Contributor

The range optimizer did not handle the case of search keys being longer than the index's key length. In this case, a search key could be created, and in the case of WHERE clauses with multiple sufficient conditions - either OR or IN conditions - it had to fall back to table scan.

For binary collations, this is fine: an equality condition involving a value that wouldn't have fit the column can never match, and so we can safely just ignore it in a disjuntion expression. For other collations, however, a longer sequence in code points can match a shorter one, and there is no upper limit.

Fixed by allocating a larger buffer in case the common key creationg fails due to truncation. We allocate as much space as the worst-case scenario and then some, heeding the advice in comment above strnxfrmlen(). For full indexes we just render the entire string, and for prefixes a string we the same amount of characters (really code points) as the index's declared key length.

Putting the test case in type_varchar since this is ultimately a trait of the VARCHAR type.

@percona-mhansson
Copy link
Copy Markdown
Contributor Author

… values

The range optimizer did not handle the case of search keys being longer than the
index's key length. In this case, a search key could be created, and in the case
of WHERE clauses with multiple sufficient conditions - either OR or IN
conditions - it had to fall back to table scan.

For binary collations, this is fine: an equality condition involving a value
that wouldn't have fit the column can never match, and so we can safely just
ignore it in a disjuntion expression. For other collations, however, a longer
sequence in code points can match a shorter one, and there is no upper limit.

Fixed by allocating a larger buffer in case the common key creationg fails due
to truncation. We allocate as much space as the worst-case scenario and then
some, heeding the advice in comment above `strnxfrmlen()`. For full indexes we
just render the entire string, and for prefixes a string we the same amount of
characters (really code points) as the index's declared key length.

Putting the test case in type_varchar since this is ultimately a trait of the
VARCHAR type.
Copy link
Copy Markdown
Contributor

@kamil-holubicki kamil-holubicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't feel competent to approve or reject. Just comments.

&tree, value, type, field, &impossible_cond_cause, alloc,
param->query_block, inexact);

if (always_true_or_false && *inexact && is_string_type(field->type()) &&
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the variable always_true_or_false for the boolean vairable is just bizare :). I understand it comes from the upstream, and if you look at the description of when save_value_and_handle_conversion returns true, it makes sense. But for the analysis on the caller level it looks kind of funny :)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Everything has completely nonsensical and misleading names here.


if (always_true_or_false && *inexact && is_string_type(field->type()) &&
(!is_prefix_index(field->table, key_part->key) ||
type == Item_func::EQ_FUNC)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is OR condition fine here? (looking at the comment below)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know, it's a pain to have to right the conditional like this. So the restriction is that for prefix indexes, we can only handle equality. So we can have either that it's not a prefix index - and then we allow any conditional - or else (if it is a prefix index) the condition has to be equality. So we end up with this backward-looking "not prefix or else equality".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants