An embedded Lumina server for IDA Pro with context-aware retrieval, search,
binary intelligence, and a built-in web workbench
Lumina v0-6 • sled • Tantivy • HTTP/1.1 + h2c • Binary graphs • Metadata parsingdazhbog is a self-contained Lumina-compatible server that stores, retrieves, indexes, and analyzes function metadata from IDA Pro. It combines embedded deployment, context-aware version selection, binary analytics, full-text search, native metadata decoding, and a browser workbench.
It answers more than "do I have this function?" It also answers "which binary families contain it?", "which version best fits this caller?", and "what metadata does Lumina have for this symbol?"
No special configuration required
| Area | What it does |
|---|---|
| Lumina RPC | Supports protocol versions 0 through 6, including push, pull, delete, and history flows |
| Storage | Uses sled-backed append-only segment trees plus a persistent latest-record index |
| Context | Tracks binary MD5s, basenames, observations, per-version stats, overlap caches, and binary facets in context_db |
| Search | Indexes raw names, demangled names, language tags, and binary names with Tantivy |
| Web UI | Serves a dashboard plus APIs for function detail, binary browsing, overlap, timelines, graph views, and binary comparison |
| Metadata | Parses Lumina metadata natively in Rust, including types, frame data, comments, and switch/jumptable hints |
| Recovery | Can migrate context data, rebuild indexes, rebuild search, rebuild basenames, and run full recovery flows |
| Upstream | Optionally forwards cache misses to one or more upstream Lumina servers by priority |
- Embedded, not operationally heavy - no external database, search service, or queueing tier required
- More than a cache - keeps history, context, binary observations, and per-version statistics
- Browsable corpus - ships with an HTTP workbench instead of leaving the data behind the Lumina protocol
- Built for reverse-engineering workflows - binary overlap, function history, demangling, comment/type extraction, and metadata-rich comparison are first-class features
- Recoverable by design - segment data, context data, and search state can be rebuilt with dedicated tooling
cargo build --release
./target/release/dazhbog config.tomlFor a development run:
cargo run -- config.tomlIf IDA is talking to a non-TLS dazhbog instance:
export LUMINA_TLS=falsedazhbog gives teams local Lumina compatibility with search, context, and visibility into the dataset.
- A context database in
context_db/for binary metadata, per-key basenames, binary/version stats, overlap caches, and facet summaries - A search layer in
search_index/using Tantivy for symbols, demangled names, languages, and binary names - Web APIs and a browser workbench in
src/api/http/for function details, binary explorer views, graph exploration, overlap analysis, and compare workflows - Universal symbol demangling for Itanium C++, MSVC, Rust, Swift, Go, and D
- A native Lumina metadata parser in
src/protocol/lumina/metadata.rs - Recovery tooling that can migrate context data, rebuild the latest index, rebuild basenames, rebuild search, and run full recovery passes
- Binary analysis features including family timelines, overlap percentages, related binaries, and compare buckets such as shared, left-only, right-only, metadata-rich, rare-symbol, and freshest-drift
- Embedded storage - no external database required
- Protocol support - compatible with IDA Pro Lumina protocol versions
0-6 - Append-only records - immutable history via
prev_addrchains - Context-aware version selection - chooses the best candidate using binary MD5, basename similarity, co-occurrence, stability, recency, and binary popularity
- Optional upstream forwarding - one or more upstream Lumina servers with priority ordering
- TLS support - PKCS#12 via
native-tlsor PEM viarustls
- Function search by stored symbol, demangled symbol, or associated binary name
- Binary search by basename and observed metadata
- Function detail API at
/api/function/:key - Binary detail API at
/api/binary/:md5 - Binary graph API at
/api/binary/:md5/graph - Binary overlap API at
/api/binary/:md5/overlap - Binary comparison API at
/api/binary-compare/:left/:right - Prometheus metrics at
/metrics - Metrics JSON at
/api/metrics
The dashboard shows demangled names, parsed metadata, language badges, binary relationships, timeline views, coverage/facet summaries, and compare panels.
- Per-binary summaries with observation counts, function counts, first/last seen timestamps, and host tracking
- Binary overlap discovery based on shared functions
- Binary family timelines for related samples
- Neighborhood graphs for exploring binary clusters
- Comparison buckets for shared, unique, metadata-rich, rare-symbol, and freshest-drift function sets
- Facet summaries showing typed/commented/switch-heavy coverage across a binary
The Rust parser in src/protocol/lumina/metadata.rs can decode and expose:
- function type information
- frame descriptions and frame members
- decompiler elapsed values
- function comments and repeatable comments
- instruction comments and repeatable instruction comments
- derived switch and jumptable hints from parsed comments
The analysis/ directory contains reverse-engineering notes and Python parsers used to validate the format against real dumped payloads.
dazhbog is built around four main on-disk stores and two serving layers.
IDA client / browser
|
v
Lumina RPC server / HTTP server
|
+--> latest key index ---------------> fetch current record
|
+--> context_db ---------------------> score versions, attach binaries, compute overlap/facets
|
+--> search_index -------------------> search functions and binaries
|
+--> segments_db --------------------> walk history, read raw records, parse metadata
|
+--> upstream servers (optional) ----> fill local cache on misses
| Path | Purpose |
|---|---|
segments_db/ |
Append-only sled trees named seg.00001, seg.00002, ... containing serialized records |
index/ |
Persistent key -> latest address lookup |
context_db/ |
Binary metadata, basename associations, version stats, overlap caches, facet caches, popularity data |
search_index/ |
Tantivy full-text index for functions and binaries |
The append-only segment record keeps:
- a 128-bit function key
- timestamps and popularity
- a pointer to the previous version via
prev_addr - the function name
- the raw Lumina metadata payload
- tombstone state for deletes
That layout lets dazhbog answer three different kinds of query from the same corpus:
- latest-value lookup through the persistent key index
- history traversal by following
prev_addr - binary/context-driven retrieval by joining against
context_db
- Lumina RPC server - handles Lumina clients, protocol negotiation, pull/push/delete/history flows, TLS, and upstream forwarding
- HTTP server - serves the dashboard, JSON APIs, metrics, and cleartext HTTP/2 (
h2c)
When TLS is enabled, the server can also expose HTTP over the Lumina side with ALPN-aware handling.
When multiple versions exist for a key, dazhbog scores candidates using weighted signals:
- exact binary MD5 match
- basename suffix similarity
- binary co-occurrence probability
- observation stability
- recency
- binary popularity
If context data is not available yet, it falls back to the latest stored version.
- Lumina protocol versions
0through6 - plaintext or TLS on the Lumina side
- HTTP/1.1 and cleartext HTTP/2 (
h2c) on the HTTP side - optional HTTP handling on the TLS/Lumina side when enabled
- optional upstream miss forwarding with priority ordering
src/main.rs- server entrypointsrc/db/- high-level database API, search enrichment, binary compare logic, version scoringsrc/engine/- segments, indexes, context index, search index, runtime wiringsrc/protocol/lumina/- Lumina wire handling and metadata parsingsrc/api/http/- dashboard templates, HTTP handlers, router, metrics APIssrc/bin/recover.rs- rebuild, migration, and recovery utilitysrc/bin/dump_functions.rs- dump stored raw metadata payloadssrc/bin/dump_function_names.rs- export function names from the corpustests/- protocol, metadata, fuzz, boundary, stress, and TLS-oriented coverageanalysis/- parser notes and Python validation tooling
# Build
cargo build --release
# Run
./target/release/dazhbog config.toml
# If IDA should use plaintext instead of TLS
export LUMINA_TLS=falseIn IDA, point Lumina at your configured host and port and use guest / guest.
For a live instance, use the public server block at the top of this README.
The config file looks like TOML, but the parser is intentionally lightweight rather than a full TOML implementation. Dotted keys and # comments work; advanced TOML features do not.
Main config groups:
limits.*- protocol and memory limitsengine.*- storage paths, segment size, mmap reads, deduplication, index tuninglumina.*- bind address, deletes, history, TLS enablementtls.*- PKCS#12 or PEM certificate settingshttp.*- HTTP bind addressupstream.<n>.*- ordered upstream Lumina serversscoring.*- version-selection weights and capsdebug.*- protocol hello dumping
TLS modes:
- PKCS#12 / native-tls - fits IDA-style certificate setups
- PEM / rustls - preferred for modern browser behavior and HTTP/2 ALPN
If both are configured, the code prefers the PEM/rustls path.
engine.deduplicate_on_startup- rewrites away redundant records at startup; effective, but slow on large corporalumina.get_history_limit- caps history traversal returned to clientslimits.max_pull_items/limits.max_push_items- controls large batch behavior from clientsscoring.*- controls how aggressively context influences version selectionupstream.<n>.priority- lower number means higher precedence for miss forwarding
# Connection and resource limits
limits.hello_timeout_ms = 3000
limits.command_timeout_ms = 15000
limits.max_active_conns = 2048
limits.max_pull_items = 524288
limits.max_push_items = 524288
# Storage engine
engine.data_dir = "data"
engine.segment_bytes = 1073741824
engine.shard_count = 64
engine.index_capacity = 1073741824
engine.deduplicate_on_startup = false
# Lumina server
lumina.bind_addr = "0.0.0.0:1234"
lumina.server_name = "dazhbog"
lumina.allow_deletes = false
lumina.get_history_limit = 32
lumina.use_tls = false
# HTTP server
http.bind_addr = "0.0.0.0:8080"
# Optional upstream
upstream.0.enabled = true
upstream.0.priority = 0
upstream.0.host = "lumina.hex-rays.com"
upstream.0.port = 443
upstream.0.use_tls = true
upstream.0.insecure_no_verify = true
upstream.0.hello_protocol_version = 6
upstream.0.license_path = "license.hexlic"
upstream.0.timeout_ms = 8000
upstream.0.batch_max = 131072# List sled trees
./target/release/recover --list-trees data
# Migrate old context trees into context_db
./target/release/recover --migrate-context data
# Rebuild the latest key index
./target/release/recover --rebuild-index data
# Rebuild per-key basenames from binary metadata
./target/release/recover --rebuild-basenames data
# Rebuild the search index
./target/release/recover --rebuild-search data
# Run the combined rebuild flow
./target/release/recover --rebuild-all dataOther helpers:
# Dump metadata payloads for parser work
./target/release/dump_functions --dump 2000 analysis/data
# Dump function names from the corpus
./target/release/dump_function_names --output function_names.txt --uniqueThe recovery tool covers context migration, search refreshes, basename reconstruction, and full rebuild flows.
| Endpoint | Purpose |
|---|---|
/ |
Interactive dashboard |
| `/api/search?q=...&mode=functions | binaries` |
/api/function/:key |
Full function detail, parsed metadata, binaries |
/api/binary/:md5 |
Binary summary plus facets, related views, and overview data |
/api/binary/:md5/functions |
Paginated function list for a binary |
/api/binary/:md5/overlap |
Related binaries by shared functions |
/api/binary/:md5/graph |
Graph neighborhood data |
/api/binary-compare/:left/:right |
Binary-to-binary comparison |
/metrics |
Prometheus scrape endpoint |
/api/metrics |
Metrics JSON snapshot |
dazhbog models how functions appear across binaries, not just how they map to keys.
Each function can be linked to:
- binaries that contained it
- last-seen versions for specific binaries
- observation counts
- basename aliases
- host information
- overlap caches
- facet summaries like typed/commented/switch-heavy coverage
That enables:
- binary search by basename
- paging through functions for a binary
- binary-family timelines
- related-binary discovery by overlap
- graph exploration around a binary neighborhood
- direct binary-to-binary comparison across shared and unique sets
Use the parser workflow in analysis/ for validation and exploration:
./target/release/dump_functions --dump 2000 analysis/data
python3 analysis/lumina_metadata.py
python3 analysis/fast_parser.pyRun the full suite:
cargo test --all -- --nocaptureFocused runs:
cargo test metadata_parser -- --nocapture
cargo test protocol_test -- --nocaptureCurrent coverage includes:
- metadata parser tests against real dumped payloads
- live protocol handshake and pull behavior tests
- boundary and fuzz-style network tests
- TLS/security placeholders and stress-oriented suites
Some integration tests expect a live local server and will skip if it is not running.
- Auth model - username must be
guest; password validation is intentionally minimal - Network posture - best used on trusted networks unless you place it behind your own access controls
- Migration - run
recover --migrate-contextifcontext_dbis missing - Search quality - best after rebuilding basenames and search data from a populated context database
- Runtime split - RPC and HTTP work run on dedicated runtimes to keep the UI responsive under protocol load
- Demangling - search and detail views can expose precomputed demangled names and language hints
- Parser provenance - the Rust metadata parser was validated against dumped real-world payloads in
analysis/ - Name - Dazhbog is a Slavic sun deity
MIT License
Copyright (c) 2025 Kenan Sulayman