dazhbog

An embedded Lumina server for IDA Pro with context-aware retrieval, search,
binary intelligence, and a built-in web workbench

Lumina v0-6 • sled • Tantivy • HTTP/1.1 + h2c • Binary graphs • Metadata parsing

dazhbog is a self-contained Lumina-compatible server that stores, retrieves, indexes, and analyzes function metadata from IDA Pro. It combines embedded deployment, context-aware version selection, binary analytics, full-text search, native metadata decoding, and a browser workbench.

It answers more than "do I have this function?" It also answers "which binary families contain it?", "which version best fits this caller?", and "what metadata does Lumina have for this symbol?"

Public dazhbog server

TLS and plaintext supported
No special configuration required

host: ida.int.mov
port: 1234

user: guest
pass: guest

Live public-server snapshot

Refreshed every 15 minutes from the public deployment

At a glance

Area	What it does
Lumina RPC	Supports protocol versions `0` through `6`, including push, pull, delete, and history flows
Storage	Uses sled-backed append-only segment trees plus a persistent latest-record index
Context	Tracks binary MD5s, basenames, observations, per-version stats, overlap caches, and binary facets in `context_db`
Search	Indexes raw names, demangled names, language tags, and binary names with Tantivy
Web UI	Serves a dashboard plus APIs for function detail, binary browsing, overlap, timelines, graph views, and binary comparison
Metadata	Parses Lumina metadata natively in Rust, including types, frame data, comments, and switch/jumptable hints
Recovery	Can migrate context data, rebuild indexes, rebuild search, rebuild basenames, and run full recovery flows
Upstream	Optionally forwards cache misses to one or more upstream Lumina servers by priority

Why dazhbog

Embedded, not operationally heavy - no external database, search service, or queueing tier required
More than a cache - keeps history, context, binary observations, and per-version statistics
Browsable corpus - ships with an HTTP workbench instead of leaving the data behind the Lumina protocol
Built for reverse-engineering workflows - binary overlap, function history, demangling, comment/type extraction, and metadata-rich comparison are first-class features
Recoverable by design - segment data, context data, and search state can be rebuilt with dedicated tooling

Installation

cargo build --release
./target/release/dazhbog config.toml

For a development run:

cargo run -- config.toml

If IDA is talking to a non-TLS dazhbog instance:

export LUMINA_TLS=false

What it does

dazhbog gives teams local Lumina compatibility with search, context, and visibility into the dataset.

A context database in context_db/ for binary metadata, per-key basenames, binary/version stats, overlap caches, and facet summaries
A search layer in search_index/ using Tantivy for symbols, demangled names, languages, and binary names
Web APIs and a browser workbench in src/api/http/ for function details, binary explorer views, graph exploration, overlap analysis, and compare workflows
Universal symbol demangling for Itanium C++, MSVC, Rust, Swift, Go, and D
A native Lumina metadata parser in src/protocol/lumina/metadata.rs
Recovery tooling that can migrate context data, rebuild the latest index, rebuild basenames, rebuild search, and run full recovery passes
Binary analysis features including family timelines, overlap percentages, related binaries, and compare buckets such as shared, left-only, right-only, metadata-rich, rare-symbol, and freshest-drift

Features

Core server

Embedded storage - no external database required
Protocol support - compatible with IDA Pro Lumina protocol versions 0-6
Append-only records - immutable history via prev_addr chains
Context-aware version selection - chooses the best candidate using binary MD5, basename similarity, co-occurrence, stability, recency, and binary popularity
Optional upstream forwarding - one or more upstream Lumina servers with priority ordering
TLS support - PKCS#12 via native-tls or PEM via rustls

Search and web workbench

Function search by stored symbol, demangled symbol, or associated binary name
Binary search by basename and observed metadata
Function detail API at /api/function/:key
Binary detail API at /api/binary/:md5
Binary graph API at /api/binary/:md5/graph
Binary overlap API at /api/binary/:md5/overlap
Binary comparison API at /api/binary-compare/:left/:right
Prometheus metrics at /metrics
Metrics JSON at /api/metrics

The dashboard shows demangled names, parsed metadata, language badges, binary relationships, timeline views, coverage/facet summaries, and compare panels.

Binary intelligence

Per-binary summaries with observation counts, function counts, first/last seen timestamps, and host tracking
Binary overlap discovery based on shared functions
Binary family timelines for related samples
Neighborhood graphs for exploring binary clusters
Comparison buckets for shared, unique, metadata-rich, rare-symbol, and freshest-drift function sets
Facet summaries showing typed/commented/switch-heavy coverage across a binary

Metadata parsing

The Rust parser in src/protocol/lumina/metadata.rs can decode and expose:

function type information
frame descriptions and frame members
decompiler elapsed values
function comments and repeatable comments
instruction comments and repeatable instruction comments
derived switch and jumptable hints from parsed comments

The analysis/ directory contains reverse-engineering notes and Python parsers used to validate the format against real dumped payloads.

Architecture

dazhbog is built around four main on-disk stores and two serving layers.

Request flow

IDA client / browser
        |
        v
  Lumina RPC server / HTTP server
        |
        +--> latest key index ---------------> fetch current record
        |
        +--> context_db ---------------------> score versions, attach binaries, compute overlap/facets
        |
        +--> search_index -------------------> search functions and binaries
        |
        +--> segments_db --------------------> walk history, read raw records, parse metadata
        |
        +--> upstream servers (optional) ----> fill local cache on misses

Storage layout

Path	Purpose
`segments_db/`	Append-only sled trees named `seg.00001`, `seg.00002`, ... containing serialized records
`index/`	Persistent key -> latest address lookup
`context_db/`	Binary metadata, basename associations, version stats, overlap caches, facet caches, popularity data
`search_index/`	Tantivy full-text index for functions and binaries

Record model

The append-only segment record keeps:

a 128-bit function key
timestamps and popularity
a pointer to the previous version via prev_addr
the function name
the raw Lumina metadata payload
tombstone state for deletes

That layout lets dazhbog answer three different kinds of query from the same corpus:

latest-value lookup through the persistent key index
history traversal by following prev_addr
binary/context-driven retrieval by joining against context_db

Serving layers

Lumina RPC server - handles Lumina clients, protocol negotiation, pull/push/delete/history flows, TLS, and upstream forwarding
HTTP server - serves the dashboard, JSON APIs, metrics, and cleartext HTTP/2 (h2c)

When TLS is enabled, the server can also expose HTTP over the Lumina side with ALPN-aware handling.

Version selection

When multiple versions exist for a key, dazhbog scores candidates using weighted signals:

exact binary MD5 match
basename suffix similarity
binary co-occurrence probability
observation stability
recency
binary popularity

If context data is not available yet, it falls back to the latest stored version.

Protocol and transport

Lumina protocol versions 0 through 6
plaintext or TLS on the Lumina side
HTTP/1.1 and cleartext HTTP/2 (h2c) on the HTTP side
optional HTTP handling on the TLS/Lumina side when enabled
optional upstream miss forwarding with priority ordering

Repository map

src/main.rs - server entrypoint
src/db/ - high-level database API, search enrichment, binary compare logic, version scoring
src/engine/ - segments, indexes, context index, search index, runtime wiring
src/protocol/lumina/ - Lumina wire handling and metadata parsing
src/api/http/ - dashboard templates, HTTP handlers, router, metrics APIs
src/bin/recover.rs - rebuild, migration, and recovery utility
src/bin/dump_functions.rs - dump stored raw metadata payloads
src/bin/dump_function_names.rs - export function names from the corpus
tests/ - protocol, metadata, fuzz, boundary, stress, and TLS-oriented coverage
analysis/ - parser notes and Python validation tooling

Quick start

# Build
cargo build --release

# Run
./target/release/dazhbog config.toml

# If IDA should use plaintext instead of TLS
export LUMINA_TLS=false

In IDA, point Lumina at your configured host and port and use guest / guest.

For a live instance, use the public server block at the top of this README.

Configuration

The config file looks like TOML, but the parser is intentionally lightweight rather than a full TOML implementation. Dotted keys and # comments work; advanced TOML features do not.

Main config groups:

limits.* - protocol and memory limits
engine.* - storage paths, segment size, mmap reads, deduplication, index tuning
lumina.* - bind address, deletes, history, TLS enablement
tls.* - PKCS#12 or PEM certificate settings
http.* - HTTP bind address
upstream.<n>.* - ordered upstream Lumina servers
scoring.* - version-selection weights and caps
debug.* - protocol hello dumping

TLS modes:

PKCS#12 / native-tls - fits IDA-style certificate setups
PEM / rustls - preferred for modern browser behavior and HTTP/2 ALPN

If both are configured, the code prefers the PEM/rustls path.

Important operational settings

engine.deduplicate_on_startup - rewrites away redundant records at startup; effective, but slow on large corpora
lumina.get_history_limit - caps history traversal returned to clients
limits.max_pull_items / limits.max_push_items - controls large batch behavior from clients
scoring.* - controls how aggressively context influences version selection
upstream.<n>.priority - lower number means higher precedence for miss forwarding

Example config

# Connection and resource limits
limits.hello_timeout_ms = 3000
limits.command_timeout_ms = 15000
limits.max_active_conns = 2048
limits.max_pull_items = 524288
limits.max_push_items = 524288

# Storage engine
engine.data_dir = "data"
engine.segment_bytes = 1073741824
engine.shard_count = 64
engine.index_capacity = 1073741824
engine.deduplicate_on_startup = false

# Lumina server
lumina.bind_addr = "0.0.0.0:1234"
lumina.server_name = "dazhbog"
lumina.allow_deletes = false
lumina.get_history_limit = 32
lumina.use_tls = false

# HTTP server
http.bind_addr = "0.0.0.0:8080"

# Optional upstream
upstream.0.enabled = true
upstream.0.priority = 0
upstream.0.host = "lumina.hex-rays.com"
upstream.0.port = 443
upstream.0.use_tls = true
upstream.0.insecure_no_verify = true
upstream.0.hello_protocol_version = 6
upstream.0.license_path = "license.hexlic"
upstream.0.timeout_ms = 8000
upstream.0.batch_max = 131072

Admin and recovery commands

# List sled trees
./target/release/recover --list-trees data

# Migrate old context trees into context_db
./target/release/recover --migrate-context data

# Rebuild the latest key index
./target/release/recover --rebuild-index data

# Rebuild per-key basenames from binary metadata
./target/release/recover --rebuild-basenames data

# Rebuild the search index
./target/release/recover --rebuild-search data

# Run the combined rebuild flow
./target/release/recover --rebuild-all data

Other helpers:

# Dump metadata payloads for parser work
./target/release/dump_functions --dump 2000 analysis/data

# Dump function names from the corpus
./target/release/dump_function_names --output function_names.txt --unique

The recovery tool covers context migration, search refreshes, basename reconstruction, and full rebuild flows.

API surface

Endpoint	Purpose
`/`	Interactive dashboard
`/api/search?q=...&mode=functions	binaries`
`/api/function/:key`	Full function detail, parsed metadata, binaries
`/api/binary/:md5`	Binary summary plus facets, related views, and overview data
`/api/binary/:md5/functions`	Paginated function list for a binary
`/api/binary/:md5/overlap`	Related binaries by shared functions
`/api/binary/:md5/graph`	Graph neighborhood data
`/api/binary-compare/:left/:right`	Binary-to-binary comparison
`/metrics`	Prometheus scrape endpoint
`/api/metrics`	Metrics JSON snapshot

Binary intelligence

dazhbog models how functions appear across binaries, not just how they map to keys.

Each function can be linked to:

binaries that contained it
last-seen versions for specific binaries
observation counts
basename aliases
host information
overlap caches
facet summaries like typed/commented/switch-heavy coverage

That enables:

binary search by basename
paging through functions for a binary
binary-family timelines
related-binary discovery by overlap
graph exploration around a binary neighborhood
direct binary-to-binary comparison across shared and unique sets

Metadata analysis workflow

Use the parser workflow in analysis/ for validation and exploration:

./target/release/dump_functions --dump 2000 analysis/data
python3 analysis/lumina_metadata.py
python3 analysis/fast_parser.py

Testing

Run the full suite:

cargo test --all -- --nocapture

Focused runs:

cargo test metadata_parser -- --nocapture
cargo test protocol_test -- --nocapture

Current coverage includes:

metadata parser tests against real dumped payloads
live protocol handshake and pull behavior tests
boundary and fuzz-style network tests
TLS/security placeholders and stress-oriented suites

Some integration tests expect a live local server and will skip if it is not running.

Notes

Auth model - username must be guest; password validation is intentionally minimal
Network posture - best used on trusted networks unless you place it behind your own access controls
Migration - run recover --migrate-context if context_db is missing
Search quality - best after rebuilding basenames and search data from a populated context database
Runtime split - RPC and HTTP work run on dedicated runtimes to keep the UI responsive under protocol load
Demangling - search and detail views can expose precomputed demangled names and language hints
Parser provenance - the Rust metadata parser was validated against dumped real-world payloads in analysis/
Name - Dazhbog is a Slavic sun deity

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
src		src
tests		tests
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml

Folders and files

Latest commit

History

Repository files navigation

dazhbog

An embedded Lumina server for IDA Pro with context-aware retrieval, search,binary intelligence, and a built-in web workbench

Public dazhbog server

host: ida.int.movport: 1234

user: guestpass: guest

Live public-server snapshot

At a glance

Why dazhbog

Installation

What it does

Features

Core server

Search and web workbench

Binary intelligence

Metadata parsing

Architecture

Request flow

Storage layout

Record model

Serving layers

Version selection

Protocol and transport

Repository map

Quick start

Configuration

Important operational settings

Example config

Admin and recovery commands

API surface

Binary intelligence

Metadata analysis workflow

Testing

Notes

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

An embedded Lumina server for IDA Pro with context-aware retrieval, search,
binary intelligence, and a built-in web workbench

host: ida.int.mov
port: 1234

user: guest
pass: guest

Packages