A char-wise Double-Array Trie library for lexime. Zero dependencies.
Replaces trie-rs + bincode with a compact, cache-friendly trie that supports both dictionary (DoubleArray<char>) and romaji (DoubleArray<u8>) use cases.
- 8 bytes/node (
#[repr(C)]) — fits 8 nodes per cache line - Char-wise indexing — 1.5-2x faster than byte-wise for Japanese text (based on crawdad benchmarks)
- Frequency-ordered CodeMapper — remaps labels to dense codes for compact arrays
- Terminal symbol approach — cleanly handles keys that are both exact matches and prefixes of other keys
- Sibling chain (SoA) — enables predictive search via DFS without increasing node size for other operations
- Fast serialization — binary format with
LXTRmagic header, ~5ms copy-load - Zero-copy deserialization —
DoubleArrayRefborrows nodes/siblings directly from mmap or byte buffer
| Operation | Description | Use Case |
|---|---|---|
exact_match |
O(m) key lookup | Dictionary lookup |
common_prefix_search |
All prefixes of a query | Lattice construction (Viterbi) |
predictive_search |
All keys starting with a prefix | Autocomplete / predict |
probe |
Key existence + children check (4-state) | Romaji input (None/Prefix/Exact/ExactAndPrefix) |
use lexime_trie::DoubleArray;
// Build from sorted keys (value_id = index)
let da = DoubleArray::<u8>::build(&[b"abc", b"abd", b"xyz"]);
// Exact match
assert_eq!(da.exact_match(b"abc"), Some(0));
assert_eq!(da.exact_match(b"zzz"), None);
// Common prefix search
for m in da.common_prefix_search(b"abcd") {
println!("prefix len={}, value_id={}", m.len, m.value_id);
}
// Predictive search
for m in da.predictive_search(b"ab") {
println!("key={:?}, value_id={}", m.key, m.value_id);
}
// Probe (romaji trie scenario)
let result = da.probe(b"ab");
// result.value: Option<u32>, result.has_children: bool
// Serialization
let bytes = da.as_bytes();
let da2 = DoubleArray::<u8>::from_bytes(&bytes).unwrap();The input buffer must be aligned to at least 4 bytes. Buffers from mmap or page-aligned allocations satisfy this requirement.
use lexime_trie::{DoubleArray, DoubleArrayRef};
let da = DoubleArray::<u8>::build(&[b"abc", b"abd", b"xyz"]);
let raw = da.as_bytes();
// In production, use mmap (page-aligned). For a quick demo, use Vec<u32>
// as backing storage to guarantee 4-byte alignment.
let mut backing = vec![0u32; (raw.len() + 3) / 4];
let bytes: &mut [u8] = unsafe {
std::slice::from_raw_parts_mut(backing.as_mut_ptr() as *mut u8, raw.len())
};
bytes.copy_from_slice(&raw);
// Zero-copy: borrows nodes & siblings directly from the byte buffer
let da_ref = DoubleArrayRef::<u8>::from_bytes_ref(bytes).unwrap();
assert_eq!(da_ref.exact_match(b"abc"), Some(0));
// Convert to owned if needed
let da_owned = da_ref.to_owned();use lexime_trie::DoubleArray;
let keys: Vec<Vec<char>> = vec![
"あい".chars().collect(),
"あう".chars().collect(),
"かき".chars().collect(),
];
let da = DoubleArray::<char>::build(&keys);
assert!(da.exact_match(&"あい".chars().collect::<Vec<_>>()).is_some());- Little-endian only (x86_64, aarch64, etc.). The crate emits a
compile_error!on big-endian targets. - Serialization uses native LE layout for zero-copy performance; the binary format is not portable to big-endian platforms.
See SPEC.md for the full design document.
MIT