-
Notifications
You must be signed in to change notification settings - Fork 115
Future: support for hardening parts of a schema into a tabular representation #33
Description
We present a tradeoff:
- Put all of your data in Datomish, getting expressivity but losing out on compactness and query performance for some kinds of regular data.
- Put all of your data in a SQLite database, getting some significant performance improvements at a cost of expressivity.
It's not all that feasible to have a consumer manually do both. Information is messy and interconnected, which is one of the motivations for doing this work in the first place!
We ourselves split the difference in a couple of areas: for example, we store value tags and values next to each other, rather than being fully normalized.
Some consumers can guarantee that some of their data will be of a particular non-sparse shape. We should consider supporting specialized storage for subparts of a schema which puts values into columns, generating query clauses and processing transactions accordingly.
Restrictions might be:
- Attributes must all be cardinality: one.
- All attribute values must be present in the same transaction? Not strictly necessary…
- Tables cannot be modified after creation.
For example, we might denote a page visit like this:
{:db/ident :page/visitTime
:db/valueType :db.type/instant
:db/cardinality :db.cardinality/one
:db/index true}
{:db/ident :page/visitDevice
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one}
{:db/ident :page/visitType
:db/valueType :db.type/keyword
:db/cardinality :db.cardinality/one}
{:table "visits"
:columns [:page/visitTime :page/visitDevice :page/visitType]}
and we expect that to produce a single table and index pair:
CREATE TABLE datom_table_visits
(e INTEGER NOT NULL,
pageVisitTime INTEGER, -- No tag needed!
pageVisitDevice TEXT,
pageVisitType TEXT);
CREATE INDEX datom_table_visits_pageVisitTime ON datom_table_visits(pageVisitTime);
CREATE INDEX datom_table_visits_e ON datom_table_visits(e);
and then queries like
[:find ?page ?type :in $ :where
[?page :page/visitTime ?time]
[(> ?time 1234567890)]
[?page :page/visitDevice "abcdefg"]
[?page :page/visitType ?type]]
would turn into
SELECT d123.e AS page, d123.pageVisitType AS type
FROM datom_table_visits AS d123
WHERE d123.pageVisitTime > 1234567890 AND
d123.pageVisitDevice = "abcdefg" AND
d123.pageVisitType IS NOT NULL; -- NOT NULL no longer implied for these columns.
and churn through Datomish's existing projection pipeline.
Consumers can still make additional non-tabular references to visits, both in transacts and queries, but get the performance benefits of a SQL-like table structure where it makes sense.