-
Notifications
You must be signed in to change notification settings - Fork 115
Consider an alternative disk representation for datom flags #29
Description
We have four fields in datoms: index_vaet, index_avet, index_fulltext and unique_value.
These fields appear in datoms and in the indices idx_datoms_eavt and idx_datoms_aevt.
Some rows also appear in idx_datoms_avet, idx_datoms_vaet, idx_datoms_fulltext, and idx_datoms_unique_value.
That is: each datom is responsible for anywhere from 12 to 28 TINYINTs in the DB. Sometimes these won't contribute to space (if they're zero, I've read that sqlite should pack them down to nothing), but assuming one byte per field, half a million datoms will add up to 14MB to the DB just for these flags.
These flags are entirely derived from the schema: a datom's attribute is the sole determinant (AFAICS) of whether these flags are 1 or 0 for a row.
Now, partial indexes cannot refer to other tables or call functions, so we can't simply join against the schema in the CREATE INDEX … WHERE clause. But we could use a more complex operator-driven expression, including bitmasks, to compress these flags.
We could also consider approaches to simply removing columns:
index_vaetandindex_fulltextare mutually exclusive, so they could be two integer values in a single field.index_vaetcorresponds to:db/valueType :db.type/ref, which is already implicitly represented by avalue_type_tagof 0, so we can filter on that instead.
Finally, we could consider direct schema creation as an approach: rather than having idx_datoms_fulltext, for example, we could create per-attribute indices when we register a schema fragment that includes :fulltext true:
CREATE INDEX idx_datoms_fulltext_page_title ON datoms (value_type_tag, v, a, e) WHERE a = 65538 -- The interned attribute.
I don't know if sqlite will happily use a collection of such indices (perhaps it'd be quicker for real-world queries!), but it's worth exploring.