Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Implement a hash-based indexing option for attributes #69

@rnewman

Description

@rnewman

Our schema language supports :index true, which we interpret as a SQL index on vaet.

For strings, that's inefficient: firstly we get duplication of string content, which is space-expensive, but we're also comparing strings rather than primitive values.

Bug 889561 removed the URL unique index in Places, replacing it with a url_hash column that's queried first. The hash is computed by mfbt's AddToHash, which goes a lil' something like this:

    if (mode.IsEmpty()) {
      // URI-like strings (having a prefix before a colon), are handled specially,
      // as a 48 bit hash, where first 16 bits are the prefix hash, while the
      // other 32 are the string hash.
      // The 16 bits have been decided based on the fact hashing all of the IANA
      // known schemes, plus "places", does not generate collisions.
      nsAString::const_iterator start, tip, end;
      str.BeginReading(tip);
      start = tip;
      str.EndReading(end);
      if (FindInReadable(NS_LITERAL_STRING(":"), tip, end)) {
        const nsDependentSubstring& prefix = Substring(start, tip);
        uint64_t prefixHash = static_cast<uint64_t>(HashString(prefix) & 0x0000FFFF);
        // The second half of the url is more likely to be unique, so we add it.
        uint32_t srcHash = HashString(str);
        uint64_t hash = (prefixHash << 32) + srcHash;
        result->SetAsInt64(hash);
      } else {
        uint32_t hash = HashString(str);
        result->SetAsInt64(hash);
      }
    } else if (mode.Equals(NS_LITERAL_CSTRING("prefix_lo"))) {
      // Keep only 16 bits.
      uint64_t hash = static_cast<uint64_t>(HashString(str) & 0x0000FFFF) << 32;
      result->SetAsInt64(hash);
    } else if (mode.Equals(NS_LITERAL_CSTRING("prefix_hi"))) {
      // Keep only 16 bits.
      uint64_t hash = static_cast<uint64_t>(HashString(str) & 0x0000FFFF) << 32;
      // Make this a prefix upper bound by filling the lowest 32 bits.
      hash +=  0xFFFFFFFF;
      result->SetAsInt64(hash);
    } else {
      return NS_ERROR_FAILURE;
    }

Because we can find ea->v from datoms, we can support indexed lookup from v->eat by defining a separate materialized view, heat (direct index, but collision avoidance needed) or hi (relying on an easy indexed join against datoms), for values that are indexed only for direct lookup and not for range queries.

This is also particularly useful for fulltext_values: the FTS virtual table requires a table scan to look up exact values, which makes insertion slow.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-corediscussionThis issue captures useful discussion or knowledge.sizespeed

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions