Skip to content

About 15% regression on string sorts #2776

@stuhood

Description

@stuhood

Post #2726, there is about a 15% regression on string sorts.

I have not had time to triage this yet, but I strongly suspect that it is due to impl SortKeyComputer for SortByString doing individual lookups to the column's dictionary per term:

fn convert_segment_sort_key(&self, term_ord_opt: Option<TermOrdinal>) -> Option<String> {
let term_ord = term_ord_opt?;
let str_column = self.str_column_opt.as_ref()?;
let mut bytes = Vec::new();
str_column
.dictionary()
.ord_to_term(term_ord, &mut bytes)
.ok()?;
String::try_from(bytes).ok()
}
}

When ordering by strings, the resulting values will be sequential in the column's dictionary. Because the dictionary is compressed, each of these lookups will decompress a block of the term dictionary, and since the values are potentially contiguous in the dictionary, this can mean that we decompress the same block multiple times.


Previously on main, this used sorted_ords_to_term_cb to batch convert the TermOrdinals into terms. One way to get this performance back would be to change SegmentSortKeyComputer::convert_segment_sort_key into a batch method.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions