Background
The TreeReader trait defines the method get_value_opt(max_version: Version, key_hash: KeyHash) -> Result<Option<OwnedValue>>. This method supports the get and get_with_proof methods of the JMT.
To satisfy this interface, an underlying RocksDB implementation has several options:
Map (Version, KeyHash) to Value
The simplest option is to store a mapping from (Version, KeyHash) to Value. This approach allows efficient writes with low compaction overhead, since items are naturally sorted by version. However, it has very poor read efficiency. To satisfy a get_value_opt query, the DB has to iterate backwards over all versions until the key_hash is found. To avoid iteration on get queries, the DB implementation needs to store a second table mapping KeyHashes to Versions. Since this table will be written in random order, it will have high compaction.
In addition, a DB with this setup can't support queries on (unhashed) keys without an adding an extra table to track which keys are present. And even with this additional table, range queries on raw keys are extremely inefficient - since the hash for each key is random.
Map (Key, Version) to Value
A second potential database layout is to store a mapping from (Key, Version) to Value. This schema has the drawback that writes happen in un-sorted order, which adds significant compaction overhead. But, it supports very efficient range queries on keys, and it allows easy retrieval of the history for a given key. So, for read-heavy workloads, this layout is much more efficient.
However, the current TreeReader interface requires apps which pick this layout to add an additional table mapping KeyHashes back to raw Keys.
Potential Change
We should consider changing the TreeReader (and the get, get_with_proof methods) to work on Keys rather than KeyHashes. This would enable implementers to pick either database layout with no extra overhead.
Background
The
TreeReadertrait defines the methodget_value_opt(max_version: Version, key_hash: KeyHash) -> Result<Option<OwnedValue>>. This method supports thegetandget_with_proofmethods of the JMT.To satisfy this interface, an underlying RocksDB implementation has several options:
Map
(Version, KeyHash)toValueThe simplest option is to store a mapping from
(Version, KeyHash)toValue. This approach allows efficient writes with low compaction overhead, since items are naturally sorted by version. However, it has very poor read efficiency. To satisfy aget_value_optquery, the DB has to iterate backwards over all versions until thekey_hashis found. To avoid iteration ongetqueries, the DB implementation needs to store a second table mappingKeyHashes toVersions. Since this table will be written in random order, it will have high compaction.In addition, a DB with this setup can't support queries on (unhashed) keys without an adding an extra table to track which keys are present. And even with this additional table, range queries on raw keys are extremely inefficient - since the hash for each key is random.
Map
(Key, Version)toValueA second potential database layout is to store a mapping from
(Key, Version)toValue. This schema has the drawback that writes happen in un-sorted order, which adds significant compaction overhead. But, it supports very efficient range queries on keys, and it allows easy retrieval of the history for a given key. So, for read-heavy workloads, this layout is much more efficient.However, the current
TreeReaderinterface requires apps which pick this layout to add an additional table mappingKeyHashes back to rawKeys.Potential Change
We should consider changing the
TreeReader(and theget,get_with_proofmethods) to work onKeys rather thanKeyHashes. This would enable implementers to pick either database layout with no extra overhead.