Skip to content

Feature: HGraphDiskANNLoader - DiskANN Index Migration to HGraph #1689

@inabao

Description

@inabao

Summary

Add HGraphDiskANNLoader, a specialized HGraph index that enables loading existing DiskANN format index files and converting them to HGraph's in-memory format, allowing seamless migration from DiskANN to HGraph without rebuilding indexes.

Motivation

Users who have already built DiskANN indexes may want to migrate to HGraph to benefit from:

  • Better search performance with HGraph's hierarchical graph structure
  • More flexible quantization options
  • In-memory index with lower latency

However, rebuilding indexes from scratch is time-consuming and resource-intensive for large datasets. This feature provides a direct conversion path from DiskANN format to HGraph format.

Key Features

1. DiskANN PQ Compressed Vector Loading

  • Load DiskANN's Product Quantization (PQ) pivots and compressed vectors
  • Convert to HGraph's basic_flatten_codes_ using PQ quantizer
  • Maintain compression ratio while enabling in-memory search

2. Vamana Graph Structure Loading

  • Load DiskANN's Vamana graph structure
  • Convert to HGraph's bottom_graph_ (single-level index)
  • Preserve graph connectivity and search quality

3. ID Mapping Support

  • Load DiskANN tags for external ID mapping
  • Map to HGraph's label_table_ for consistent ID handling

4. Precise Vector Reordering Support

  • Load DiskANN precise vectors for reordering
  • Store in HGraph's high_precise_codes_
  • Enable high-precision reordering during search

5. Dual Deserialization Support

  • Deserialize(BinarySet) - For in-memory binary data
  • Deserialize(ReaderSet) - For file-backed data

Technical Implementation

Data Mapping

DiskANN Component HGraph Component Description
PQ pivots + compressed vectors basic_flatten_codes_ PQ quantizer for compressed vector storage
Vamana graph bottom_graph_ Single-level graph structure
Tags label_table_ External ID mapping
Precise vectors high_precise_codes_ High-precision vectors for reordering

Class Hierarchy

HGraph (base class)
    └── HGraphDiskANNLoader (derived class)
            - Override Deserialize(BinarySet)
            - Override Deserialize(ReaderSet)
            - Specialized DiskANN format parsing

Key Files

File Description
src/algorithm/hgraph_diskann_loader.h Header file with class definition
src/algorithm/hgraph_diskann_loader.cpp Implementation of loading logic
src/algorithm/hgraph_diskann_loader_test.cpp Unit tests
tests/test_hgraph_diskann_loader.cpp Functional tests
include/vsag/constants.h Index type constant
src/index/diskann.h DiskANN internal structures

Usage Example

#include <vsag/vsag.h>

// Create HGraphDiskANNLoader index
vsag::IndexPtr index;
auto param = R"(
{
    "dtype": "float32",
    "metric_type": "l2",
    "dim": 128,
    "index_param": {
        "base_quantization_type": "pq",
        "base_pq_dim": 32
    }
}
)";

auto result = vsag::Factory::CreateIndex("hgraph_diskann_loader", param);
if (result.has_value()) {
    index = result.value();
}

// Load from DiskANN serialized files
vsag::BinarySet binary_set;
// ... populate binary_set with DiskANN data files ...
// Required keys: "pq", "compressed_vectors", "graph", "tags"
// Optional key: "precise_vectors" for reordering support

index->Deserialize(binary_set);

// Now use as a regular HGraph index
auto search_param = R"({"hgraph": {"ef_search": 100}})";
auto search_result = index->KnnSearch(query, 10, search_param);

Index Type Constant

constexpr static const char* INDEX_TYPE_HGRAPH_DISKANN_LOADER = "hgraph_diskann_loader";

Performance Considerations

  1. Memory Usage: The converted HGraph index resides entirely in memory, unlike DiskANN which uses disk-based storage. Ensure sufficient memory for the dataset size.

  2. Loading Time: Initial loading requires parsing DiskANN format and constructing HGraph structures, but is significantly faster than rebuilding from raw vectors.

  3. Search Performance: After conversion, HGraph typically provides lower latency due to in-memory graph traversal compared to DiskANN's disk-based search.

Limitations

  1. Currently only supports loading existing DiskANN indexes; cannot be used for building new indexes from scratch.

  2. The converted HGraph uses a single-level graph structure (bottom_graph_) instead of the typical hierarchical structure.

  3. Requires DiskANN index files to be properly formatted with expected binary layout.

Testing

  • Unit tests in src/algorithm/hgraph_diskann_loader_test.cpp
  • Functional tests in tests/test_hgraph_diskann_loader.cpp
  • Tests cover:
    • PQ data loading and quantization
    • Graph structure loading
    • Tag mapping
    • Precise vector loading
    • Search functionality after conversion

Related Commits

  • e5ec37e: Initial implementation of HGraphDiskANNLoader

Future Work

  1. Support for incremental updates after migration
  2. Optimization for loading large-scale indexes
  3. Support for additional DiskANN index variants

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions