Skip to content
This repository was archived by the owner on Sep 12, 2018. It is now read-only.
This repository was archived by the owner on Sep 12, 2018. It is now read-only.

Future: support for hardening parts of a schema into a tabular representation #33

@rnewman

Description

@rnewman

We present a tradeoff:

  • Put all of your data in Datomish, getting expressivity but losing out on compactness and query performance for some kinds of regular data.
  • Put all of your data in a SQLite database, getting some significant performance improvements at a cost of expressivity.

It's not all that feasible to have a consumer manually do both. Information is messy and interconnected, which is one of the motivations for doing this work in the first place!

We ourselves split the difference in a couple of areas: for example, we store value tags and values next to each other, rather than being fully normalized.

Some consumers can guarantee that some of their data will be of a particular non-sparse shape. We should consider supporting specialized storage for subparts of a schema which puts values into columns, generating query clauses and processing transactions accordingly.

Restrictions might be:

  • Attributes must all be cardinality: one.
  • All attribute values must be present in the same transaction? Not strictly necessary…
  • Tables cannot be modified after creation.

For example, we might denote a page visit like this:

{:db/ident :page/visitTime
 :db/valueType :db.type/instant
 :db/cardinality :db.cardinality/one
 :db/index true}
{:db/ident :page/visitDevice
 :db/valueType :db.type/string
 :db/cardinality :db.cardinality/one}
{:db/ident :page/visitType
 :db/valueType :db.type/keyword
 :db/cardinality :db.cardinality/one}

{:table "visits"
 :columns [:page/visitTime :page/visitDevice :page/visitType]}

and we expect that to produce a single table and index pair:

CREATE TABLE datom_table_visits 
(e INTEGER NOT NULL,
 pageVisitTime INTEGER,    -- No tag needed!
 pageVisitDevice TEXT,
 pageVisitType TEXT);

CREATE INDEX datom_table_visits_pageVisitTime ON datom_table_visits(pageVisitTime);
CREATE INDEX datom_table_visits_e ON datom_table_visits(e);

and then queries like

[:find ?page ?type :in $ :where
 [?page :page/visitTime ?time]
 [(> ?time 1234567890)]
 [?page :page/visitDevice "abcdefg"]
 [?page :page/visitType ?type]]

would turn into

SELECT d123.e AS page, d123.pageVisitType AS type
FROM datom_table_visits AS d123
WHERE d123.pageVisitTime > 1234567890 AND
      d123.pageVisitDevice = "abcdefg" AND
      d123.pageVisitType IS NOT NULL;        -- NOT NULL no longer implied for these columns.

and churn through Datomish's existing projection pipeline.

Consumers can still make additional non-tabular references to visits, both in transacts and queries, but get the performance benefits of a SQL-like table structure where it makes sense.

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-coreA-designPlanning and overall architecture.A-viewsMaterialized views and caches.discussionThis issue captures useful discussion or knowledge.enhancementsizespeed

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions