-
Notifications
You must be signed in to change notification settings - Fork 84
Feature: HGraphDiskANNLoader - DiskANN Index Migration to HGraph #1689
Description
Summary
Add HGraphDiskANNLoader, a specialized HGraph index that enables loading existing DiskANN format index files and converting them to HGraph's in-memory format, allowing seamless migration from DiskANN to HGraph without rebuilding indexes.
Motivation
Users who have already built DiskANN indexes may want to migrate to HGraph to benefit from:
- Better search performance with HGraph's hierarchical graph structure
- More flexible quantization options
- In-memory index with lower latency
However, rebuilding indexes from scratch is time-consuming and resource-intensive for large datasets. This feature provides a direct conversion path from DiskANN format to HGraph format.
Key Features
1. DiskANN PQ Compressed Vector Loading
- Load DiskANN's Product Quantization (PQ) pivots and compressed vectors
- Convert to HGraph's
basic_flatten_codes_using PQ quantizer - Maintain compression ratio while enabling in-memory search
2. Vamana Graph Structure Loading
- Load DiskANN's Vamana graph structure
- Convert to HGraph's
bottom_graph_(single-level index) - Preserve graph connectivity and search quality
3. ID Mapping Support
- Load DiskANN tags for external ID mapping
- Map to HGraph's
label_table_for consistent ID handling
4. Precise Vector Reordering Support
- Load DiskANN precise vectors for reordering
- Store in HGraph's
high_precise_codes_ - Enable high-precision reordering during search
5. Dual Deserialization Support
Deserialize(BinarySet)- For in-memory binary dataDeserialize(ReaderSet)- For file-backed data
Technical Implementation
Data Mapping
| DiskANN Component | HGraph Component | Description |
|---|---|---|
| PQ pivots + compressed vectors | basic_flatten_codes_ | PQ quantizer for compressed vector storage |
| Vamana graph | bottom_graph_ | Single-level graph structure |
| Tags | label_table_ | External ID mapping |
| Precise vectors | high_precise_codes_ | High-precision vectors for reordering |
Class Hierarchy
HGraph (base class)
└── HGraphDiskANNLoader (derived class)
- Override Deserialize(BinarySet)
- Override Deserialize(ReaderSet)
- Specialized DiskANN format parsing
Key Files
| File | Description |
|---|---|
src/algorithm/hgraph_diskann_loader.h |
Header file with class definition |
src/algorithm/hgraph_diskann_loader.cpp |
Implementation of loading logic |
src/algorithm/hgraph_diskann_loader_test.cpp |
Unit tests |
tests/test_hgraph_diskann_loader.cpp |
Functional tests |
include/vsag/constants.h |
Index type constant |
src/index/diskann.h |
DiskANN internal structures |
Usage Example
#include <vsag/vsag.h>
// Create HGraphDiskANNLoader index
vsag::IndexPtr index;
auto param = R"(
{
"dtype": "float32",
"metric_type": "l2",
"dim": 128,
"index_param": {
"base_quantization_type": "pq",
"base_pq_dim": 32
}
}
)";
auto result = vsag::Factory::CreateIndex("hgraph_diskann_loader", param);
if (result.has_value()) {
index = result.value();
}
// Load from DiskANN serialized files
vsag::BinarySet binary_set;
// ... populate binary_set with DiskANN data files ...
// Required keys: "pq", "compressed_vectors", "graph", "tags"
// Optional key: "precise_vectors" for reordering support
index->Deserialize(binary_set);
// Now use as a regular HGraph index
auto search_param = R"({"hgraph": {"ef_search": 100}})";
auto search_result = index->KnnSearch(query, 10, search_param);Index Type Constant
constexpr static const char* INDEX_TYPE_HGRAPH_DISKANN_LOADER = "hgraph_diskann_loader";Performance Considerations
-
Memory Usage: The converted HGraph index resides entirely in memory, unlike DiskANN which uses disk-based storage. Ensure sufficient memory for the dataset size.
-
Loading Time: Initial loading requires parsing DiskANN format and constructing HGraph structures, but is significantly faster than rebuilding from raw vectors.
-
Search Performance: After conversion, HGraph typically provides lower latency due to in-memory graph traversal compared to DiskANN's disk-based search.
Limitations
-
Currently only supports loading existing DiskANN indexes; cannot be used for building new indexes from scratch.
-
The converted HGraph uses a single-level graph structure (bottom_graph_) instead of the typical hierarchical structure.
-
Requires DiskANN index files to be properly formatted with expected binary layout.
Testing
- Unit tests in
src/algorithm/hgraph_diskann_loader_test.cpp - Functional tests in
tests/test_hgraph_diskann_loader.cpp - Tests cover:
- PQ data loading and quantization
- Graph structure loading
- Tag mapping
- Precise vector loading
- Search functionality after conversion
Related Commits
- e5ec37e: Initial implementation of HGraphDiskANNLoader
Future Work
- Support for incremental updates after migration
- Optimization for loading large-scale indexes
- Support for additional DiskANN index variants