VectorDB Modular Design#316
Conversation
|
MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅ |
There was a problem hiding this comment.
Can we add either requirements.txt or dependency update in storage/pyproject.toml for the new optional dependencies (elasticsearch, psycopg2-binary, pgvector, python-dotenv)? The existing pyproject.toml for now only lists pymilvus. This can be added as optional deps.
|
Devesena,
I believe the best way to handle this is via optional requirements in pyproject.toml. I can help on this if desired.
Regards,
—Russ
… On Apr 8, 2026, at 8:59 AM, Devasena I ***@***.***> wrote:
@idevasena commented on this pull request.
On vdb_benchmark/README.md <#316 (comment)>:
Can we add either requirements.txt or dependency update in storage/pyproject.toml for the new optional dependencies (elasticsearch, psycopg2-binary, pgvector, python-dotenv)? The existing pyproject.toml for now only lists pymilvus. This can be added as optional deps.
—
Reply to this email directly, view it on GitHub <#316 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AF64UJ4BY6MFOD3GUPNFOST4UZSNHAVCNFSM6AAAAACXOQKWD6VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHM2DANZWGA3TKMZZGU>.
You are receiving this because your review was requested.
|
|
@CShorb Finally reviewed your PR completely. Thank you for the modular VDB benchmark work. I like the backend abstraction, descriptor/registry model, config/env precedence, and the producer/consumer load + ground-truth pipeline. I think we need changes as below before merge. I updated the respective files as well, which I will push as a separate commit to this PR next:
|
VectorDB Modular Design
Adds a modular, backend-agnostic vector database benchmarking framework that measures load throughput, search QPS, recall@K, and latency percentiles (P50/P90/P99) across pluggable database backends.
Architecture
The framework introduces an abstract VectorDBBackend interface with a self-describing descriptor system and auto-discovery registry, enabling new backends to be added by simply dropping a sub-package into the backends/ directory with no other code changes required.
Included Backends
Three backend implementations are included out of the box:
Benchmark Pipeline
Uses a three-way producer-consumer architecture:
Configuration & CLI
--what-if dry-runmode--planexecution planningIncluded Configs
Four ready-to-use benchmark configs (1M vectors, 1536 dimensions) are provided along with a .env.example template for connection credentials.