We had a few dryrun tests for vdb benchmark on a single-host system with a single Gen5 NVMe (Solidigm D7-PS1010).
The result shows a very low number only about 0.0090 for mean recall@10. We had some study for the issue and found that the collectionmlps_1m_1shards_1536dim_uniform_flat_gt size is 10000, which is only 1% of the total 1M vector size. Since the flat collection size is too small, how can we get a higher recall@10 result ? I think the flat collection size should be equal to the original vector size. That's more reasonable ? So, I don't know if this is by design or it's a bug ?
(.venv) root@cnit-zz-01:~/raysmond/workspace/mlperf_v3/storage2/vdb_benchmark# python vdbbench/list_collections.py
2026-05-14 00:58:06,702 - INFO - Connected to Milvus server at 127.0.0.1:19530
2026-05-14 00:58:06,704 - INFO - Found 2 collections
2026-05-14 00:58:06,704 - INFO - Getting information for collection: mlps_1m_1shards_1536dim_uniform
2026-05-14 00:58:06,714 - INFO - Getting information for collection: mlps_1m_1shards_1536dim_uniform_flat_gt
+-----------------------------------------+----------------+-------------+---------------+----------------+--------------+
| Collection Name | Vector Count | Dimension | Index Types | Metric Types | Partitions |
+=========================================+================+=============+===============+================+==============+
| mlps_1m_1shards_1536dim_uniform | 1000000 | 1536 | DISKANN | COSINE | 1 |
+-----------------------------------------+----------------+-------------+---------------+----------------+--------------+
| mlps_1m_1shards_1536dim_uniform_flat_gt | 10000 | 1536 | FLAT | COSINE | 1 |
+-----------------------------------------+----------------+-------------+---------------+----------------+--------------+
2026-05-14 00:58:06,900 - INFO - Disconnected from Milvus server
Here are the commands we executed and the dryrun logs:
# load the vector database
(.venv) root@cnit-zz-01:~/raysmond/workspace/mlperf_v3/storage2# ./mlpstorage vectordb datagen --host 127.0.0.1 --port 19530 --config default --force --results-dir ./vdb_results --file
# run the query cmd (PATH A)
(.venv) root@cnit-zz-01:~/raysmond/workspace/mlperf_v3/storage2/vdb_benchmark# python vdbbench/enhanced_bench.py \
--host 127.0.0.1 \
--collection mlps_1m_1shards_1536dim_uniform \
--auto-create-flat \
--runtime 120 \
--batch-size 10 \
--processes 8 \
--search-limit 10 \
--search-ef 200 \
--queries 100000 \
--recall-k 10 \
--cache-state cold \
--drop-caches-cmd "sh -c 'echo 3 > /proc/sys/vm/drop_caches'"
Logs:
============================================================
ENHANCED VDB BENCH — runtime/query-count mode
============================================================
Results will be saved to: vdbbench_results/20260514_005609
============================================================
Database Verification and Collection Loading
============================================================
Connecting to Milvus server at 127.0.0.1:19530...
Collection mlps_1m_1shards_1536dim_uniform already loaded.
+---------------------------------+----------------+-------------+---------------+----------------+--------------+
| Collection Name | Vector Count | Dimension | Index Types | Metric Types | Partitions |
+=================================+================+=============+===============+================+==============+
| mlps_1m_1shards_1536dim_uniform | 1000000 | 1536 | DISKANN | COSINE | 1 |
+---------------------------------+----------------+-------------+---------------+----------------+--------------+
Detected source vector field: 'vector'
============================================================
RECALL SETUP (outside benchmark timing)
============================================================
Ground truth is pre-computed using a FLAT (brute-force) index.
Using metric type: COSINE
Generating 1000 query vectors (dim=1536, seed=42)...
Generated 1000 query vectors.
Setting up FLAT collection: mlps_1m_1shards_1536dim_uniform_flat_gt
FLAT collection exists but has 10000 vs 1000000 vectors. Dropping and recreating...
Creating FLAT collection 'mlps_1m_1shards_1536dim_uniform_flat_gt' from source 'mlps_1m_1shards_1536dim_uniform'...
Source schema: pk_field='id' (INT64), vec_field='vector', vectors=1000000
Copying 1000000 vectors to FLAT collection (batch_size=5000)...
Copied 152/1000000 vectors (0.0%)
Copied 304/1000000 vectors (0.0%)
Copied 456/1000000 vectors (0.0%)
Copied 608/1000000 vectors (0.1%)
Copied 760/1000000 vectors (0.1%)
Copied 912/1000000 vectors (0.1%)
Copied 1064/1000000 vectors (0.1%)
Copied 1216/1000000 vectors (0.1%)
Copied 1368/1000000 vectors (0.1%)
Copied 1520/1000000 vectors (0.2%)
Copied 1672/1000000 vectors (0.2%)
Copied 1824/1000000 vectors (0.2%)
Copied 1976/1000000 vectors (0.2%)
Copied 2132/1000000 vectors (0.2%)
Copied 2289/1000000 vectors (0.2%)
Copied 2446/1000000 vectors (0.2%)
Copied 2603/1000000 vectors (0.3%)
Copied 2757/1000000 vectors (0.3%)
Copied 2909/1000000 vectors (0.3%)
Copied 3063/1000000 vectors (0.3%)
Copied 3220/1000000 vectors (0.3%)
Copied 3376/1000000 vectors (0.3%)
Copied 3528/1000000 vectors (0.4%)
Copied 3680/1000000 vectors (0.4%)
Copied 3837/1000000 vectors (0.4%)
Copied 3994/1000000 vectors (0.4%)
Copied 4146/1000000 vectors (0.4%)
Copied 4298/1000000 vectors (0.4%)
Copied 4450/1000000 vectors (0.4%)
Copied 4602/1000000 vectors (0.5%)
Copied 4754/1000000 vectors (0.5%)
Copied 4906/1000000 vectors (0.5%)
Copied 10000/1000000 vectors (100.0%)
Building FLAT index...
FLAT collection 'mlps_1m_1shards_1536dim_uniform_flat_gt' ready with 10000 vectors.
Pre-computing ground truth for 1000 queries using FLAT index (top_k=10)...
Ground truth pre-computation complete: 1000 queries in 0.61s
Ground truth ready: 1000 queries pre-computed.
Collecting initial disk statistics...
============================================================
Benchmark Execution
============================================================
Starting benchmark: 8 processes × 12500 queries/process
Recall: 1000 pre-generated queries, recall@10
NOTE: batch_end timing is placed BEFORE recall capture — performance unaffected.
NOTE: recall hits written to per-worker recall_hits_p<N>.jsonl files.
Staggering process startup by 0.125s
Starting process 0...
Process 0 initialized
Process 0 - Loading collection
Process 0: Writing results to vdbbench_results/20260514_005609/milvus_benchmark_p0.csv
Process 0: Starting benchmark ...
Starting process 1...
Process 1 initialized
Calculating recall from per-worker JSONL files...
Loaded ANN hits for 1000 unique query indices from 8 worker(s).
Calculating benchmark statistics...
============================================================
BENCHMARK SUMMARY
============================================================
Total Queries: 100000
Total Batches: 10000
Total Runtime: 46.55s
QUERY STATISTICS
------------------------------------------------------------
Mean Latency: 3.64 ms
Median Latency: 3.67 ms
P95 Latency: 3.98 ms
P99 Latency: 4.16 ms
P99.9 Latency: 4.52 ms
P99.99 Latency: 5.72 ms
Throughput: 2148.38 queries/second
BATCH STATISTICS
------------------------------------------------------------
Mean Batch Time: 36.40 ms
Median Batch Time: 36.74 ms
P95 Batch Time: 39.84 ms
P99 Batch Time: 41.64 ms
P99.9 Batch Time: 45.19 ms
P99.99 Batch Time: 57.24 ms
Max Batch Time: 97.02 ms
Batch Throughput: 27.47 batches/second
RECALL STATISTICS (recall@10)
------------------------------------------------------------
Mean Recall: 0.0090
Median Recall: 0.0000
Min Recall: 0.0000
Max Recall: 0.3000
P95 Recall: 0.1000
P99 Recall: 0.1000
Queries Evaluated: 1000
DISK I/O DURING BENCHMARK
------------------------------------------------------------
Total Read: 295.53 GB (6501.54 MB/s, 832193 IOPS)
Total Write: 223.03 MB (4.79 MB/s, 82 IOPS)
Per-Device Breakdown:
nvme11n1:
Read: 600.00 KB (0.01 MB/s, 0 IOPS)
Write: 3.82 MB (0.08 MB/s, 8 IOPS)
nvme11n1p3:
Read: 600.00 KB (0.01 MB/s, 0 IOPS)
Write: 3.82 MB (0.08 MB/s, 8 IOPS)
nvme14n1:
Read: 147.77 GB (3250.75 MB/s, 416096 IOPS)
Write: 105.79 MB (2.27 MB/s, 26 IOPS)
nvme14n1p1:
Read: 147.77 GB (3250.75 MB/s, 416096 IOPS)
Write: 105.79 MB (2.27 MB/s, 26 IOPS)
dm-0:
Read: 600.00 KB (0.01 MB/s, 0 IOPS)
Write: 3.82 MB (0.08 MB/s, 14 IOPS)
Detailed results: vdbbench_results/20260514_005609
Recall details: vdbbench_results/20260514_005609/recall_stats.json
============================================================
As you can see from the logs, the test suite only copy 1% (10000) vectors to a new FLAT collection. But the progress shows as 100%.
** The rootcause (guess) **
I think there might be an issue with the insert_data(collection, vectors, batch_size=10000) methond in file load_vdb.py.
The caller method always passes the new chunk_vectors to the insert_data instead of passing the whole vectors. So the method insert_data always generates the new batch chunk vectors starting with id 0 to 10000. So the final vector collection has many duplicated ids.
For that reason, when we try to run the enhanced_bench.py with --auto-create-flat. It will count and copy only 10000 vectors to the FLAT collection.
We had a few dryrun tests for vdb benchmark on a single-host system with a single Gen5 NVMe (Solidigm D7-PS1010).
The result shows a very low number only about 0.0090 for mean recall@10. We had some study for the issue and found that the collection
mlps_1m_1shards_1536dim_uniform_flat_gtsize is 10000, which is only 1% of the total 1M vector size. Since the flat collection size is too small, how can we get a higher recall@10 result ? I think the flat collection size should be equal to the original vector size. That's more reasonable ? So, I don't know if this is by design or it's a bug ?Here are the commands we executed and the dryrun logs:
Logs:
As you can see from the logs, the test suite only copy 1% (10000) vectors to a new FLAT collection. But the progress shows as 100%.
** The rootcause (guess) **
I think there might be an issue with the
insert_data(collection, vectors, batch_size=10000)methond in fileload_vdb.py.The caller method always passes the new
chunk_vectorsto theinsert_datainstead of passing the whole vectors. So the methodinsert_dataalways generates the new batch chunk vectors starting with id 0 to 10000. So the final vector collection has many duplicated ids.For that reason, when we try to run the
enhanced_bench.pywith--auto-create-flat. It will count and copy only 10000 vectors to the FLAT collection.