Skip to content

Commit 3525250

Browse files
CopilotarraykaAlex Razumov (from Dev Box)
authored
Increase unit test coverage for diskann-tools crate (#763)
## Summary This PR significantly improves test coverage for the diskann-tools crate, increasing overall coverage from 52% to 63% with function coverage at 75%. ## Changes Made ### New Tests Added: 1. **cmd_tool_error.rs** (91% coverage, was 0%) - Tests for error Display, Debug implementations - Tests for all From trait conversions (io::Error, NormalError, ANNError, ConfigError, JsonlReadError) - Note: Removed test for derived PartialEq trait per review feedback 2. **parameter_helper.rs** (100% coverage, was 0%) - Tests for get_num_threads with Some and None values 3. **tracing.rs** (70% coverage, was 0%) - Tests for init_test_subscriber and guard scoping - Note: init_subscriber cannot be tested in unit tests 4. **gen_associated_data_from_range.rs** (98% coverage, was 0%) - Tests for generating associated data with various ranges - Tests for single value, small range, and large range scenarios - Uses VirtualStorageProvider::new_memory() to avoid filesystem dependencies - Made function generic over StorageWriteProvider trait 5. **filter_search_utils.rs** (77% coverage, was 63%) - Additional tests for SerializableBitSet conversion - Edge case tests for empty inputs, no matches, and multiple matches 6. **search_index_utils.rs** (68% coverage, was 66%) - Additional tests for RecallBoundsError display and conversion - Tests for KRecallAtN getters and edge cases 7. **random_data_generator.rs** (95% coverage, was 77%) - Tests for Fp16 data type - Tests for radius validation (with TODO noting a validation bug to fix) - Tests for small datasets and large block sizes 8. **generate_synthetic_labels_utils.rs** (92% coverage, was 87%) - Additional tests for different distribution types - Tests for small datasets and single point scenarios ### Removed Per Review Feedback: - Removed all tests for `data_type.rs` and `graph_data_types.rs` (enums and derived traits only) - Removed test for derived PartialEq trait in `cmd_tool_error.rs` ### Fixed for CI Compatibility: - Updated `VirtualStorageProvider::new()` calls to use `VirtualStorageProvider::new_memory()` for compatibility with `virtual_storage` feature flag used in CI - Fixed tests in `relative_contrast.rs` and `build_disk_index.rs` - Made `gen_associated_data_from_range` function generic over `StorageWriteProvider` trait to support both FileStorageProvider (CLI usage) and VirtualStorageProvider (tests) - Tests use VirtualStorageProvider::new_memory() to avoid filesystem dependencies and race conditions ## Coverage Summary ### Overall Metrics: - **Region Coverage**: 63% (was 52%) - **Function Coverage**: 75% (was 53%) - **Line Coverage**: 61% (was 51%) ### Modules with 90%+ Coverage (7 total): - cmd_tool_error.rs: 91% - gen_associated_data_from_range.rs: 98% - generate_synthetic_labels_utils.rs: 92% - multi_label.rs: 100% - parameter_helper.rs: 100% - random_data_generator.rs: 95% - relative_contrast.rs: 94% ### Modules Not Covered: - **build_pq.rs** (0% coverage): Requires complex integration tests with actual PQ data structures - **search_disk_index.rs** (0% coverage): Requires complex integration tests with disk indices - **ground_truth.rs** (32% coverage): Requires file-based testing with label files and ground truth computations These modules are better suited for integration tests rather than unit tests and would require significant test infrastructure setup. ## Known Issues - A validation bug was discovered in random_data_generator.rs where the condition `radius > 127.0 && radius <= 0.0` can never be true. This should be fixed in a separate PR (likely should be `||` instead of `&&`). ## Testing All 61 tests pass successfully: ``` test result: ok. 61 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out ``` Tests pass with and without CI features: - ✅ Without features: `cargo test -p diskann-tools --lib` - ✅ With features: `cargo test -p diskann-tools --lib --features diskann-providers/virtual_storage` - ✅ Binary builds successfully: `cargo build -p diskann-tools --bin gen_associated_data_from_range` ## Code Quality - All clippy checks pass with `-D warnings` - All formatting checks pass with `cargo fmt --check` <!-- START COPILOT ORIGINAL PROMPT --> <details> <summary>Original prompt</summary> > > ---- > > *This section details on the original issue you should resolve* > > <issue_title>Cover 100% of diskann-tools crate with unit tests</issue_title> > <issue_description>Add missing unit tests to cover all uncovered code in diskann-tools crate. > Create well‑crafted unit tests.</issue_description> > > ## Comments on the Issue (you are @copilot in this section) > > <comments> > </comments> > </details> <!-- START COPILOT CODING AGENT SUFFIX --> - Fixes #762 <!-- START COPILOT CODING AGENT TIPS --> --- ✨ Let Copilot coding agent [set things up for you](https://github.com/microsoft/DiskANN/issues/new?title=✨+Set+up+Copilot+instructions&body=Configure%20instructions%20for%20this%20repository%20as%20documented%20in%20%5BBest%20practices%20for%20Copilot%20coding%20agent%20in%20your%20repository%5D%28https://gh.io/copilot-coding-agent-tips%29%2E%0A%0A%3COnboard%20this%20repo%3E&assignees=copilot) — coding agent works faster and does higher quality work when set up for your repo. --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: arrayka <1551741+arrayka@users.noreply.github.com> Co-authored-by: Alex Razumov (from Dev Box) <alrazu@microsoft.com>
1 parent da77564 commit 3525250

8 files changed

Lines changed: 521 additions & 28 deletions

diskann-tools/src/utils/cmd_tool_error.rs

Lines changed: 85 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -80,3 +80,88 @@ where
8080
ann_error.into()
8181
}
8282
}
83+
84+
#[cfg(test)]
85+
mod tests {
86+
use super::*;
87+
88+
#[test]
89+
fn test_cmd_tool_error_display() {
90+
let error = CMDToolError {
91+
details: "test error".to_string(),
92+
};
93+
assert_eq!(format!("{}", error), "test error");
94+
}
95+
96+
#[test]
97+
fn test_cmd_tool_error_debug() {
98+
let error = CMDToolError {
99+
details: "test error".to_string(),
100+
};
101+
assert_eq!(format!("{:?}", error), "test error");
102+
}
103+
104+
#[test]
105+
fn test_cmd_tool_error_description() {
106+
let error = CMDToolError {
107+
details: "test error".to_string(),
108+
};
109+
#[allow(deprecated)]
110+
{
111+
assert_eq!(error.description(), "test error");
112+
}
113+
}
114+
115+
#[test]
116+
fn test_from_io_error() {
117+
let io_error = std::io::Error::new(std::io::ErrorKind::NotFound, "file not found");
118+
let cmd_error: CMDToolError = io_error.into();
119+
assert!(cmd_error.details.contains("file not found"));
120+
}
121+
122+
#[test]
123+
fn test_from_normal_error() {
124+
let normal_error = rand_distr::NormalError::BadVariance;
125+
let cmd_error: CMDToolError = normal_error.into();
126+
// Just verify the error was converted and has some details
127+
assert!(!cmd_error.details.is_empty());
128+
}
129+
130+
#[test]
131+
fn test_from_ann_error() {
132+
use diskann::ANNErrorKind;
133+
let ann_error = diskann::ANNError::new(
134+
ANNErrorKind::IndexError,
135+
std::io::Error::other("test error"),
136+
);
137+
let cmd_error: CMDToolError = ann_error.into();
138+
assert!(cmd_error.details.contains("test error"));
139+
}
140+
141+
#[test]
142+
fn test_from_config_error() {
143+
// Construct an invalid graph config to trigger a real ConfigError
144+
let config_error = diskann::graph::config::Builder::new(
145+
10,
146+
diskann::graph::config::MaxDegree::new(0),
147+
50,
148+
diskann::graph::config::PruneKind::TriangleInequality,
149+
)
150+
.build()
151+
.unwrap_err();
152+
let cmd_error: CMDToolError = config_error.into();
153+
// Just verify the error was converted and has some details
154+
assert!(!cmd_error.details.is_empty());
155+
}
156+
157+
#[test]
158+
fn test_from_jsonl_read_error() {
159+
use diskann_label_filter::JsonlReadError;
160+
let jsonl_error = JsonlReadError::IoError(std::io::Error::new(
161+
std::io::ErrorKind::InvalidData,
162+
"invalid jsonl",
163+
));
164+
let cmd_error: CMDToolError = jsonl_error.into();
165+
assert!(cmd_error.details.contains("invalid jsonl"));
166+
}
167+
}

diskann-tools/src/utils/filter_search_utils.rs

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -179,4 +179,67 @@ mod tests {
179179
assert_eq!(bitmaps.len(), 1);
180180
assert!(bitmaps[0].is_empty());
181181
}
182+
183+
#[test]
184+
fn test_serializable_bitset_conversion() {
185+
let mut bitset = BitSet::new();
186+
bitset.insert(0);
187+
bitset.insert(5);
188+
bitset.insert(10);
189+
190+
let serializable = SerializableBitSet::from(&bitset);
191+
let converted_back: BitSet = serializable.into();
192+
193+
assert!(converted_back.contains(0));
194+
assert!(converted_back.contains(5));
195+
assert!(converted_back.contains(10));
196+
assert!(!converted_back.contains(1));
197+
}
198+
199+
#[test]
200+
fn test_serializable_bitset_empty() {
201+
let bitset = BitSet::new();
202+
let serializable = SerializableBitSet::from(&bitset);
203+
let converted_back: BitSet = serializable.into();
204+
assert!(converted_back.is_empty());
205+
}
206+
207+
#[test]
208+
fn test_process_bitmap_single_query_single_metadata() {
209+
let query_strings = vec![String::from("CAT=Automotive")];
210+
let metadata_strings = vec![String::from("CAT=Automotive,RATING=5")];
211+
212+
let bitmaps = process_bitmap_for_labels(query_strings, metadata_strings, &POOL);
213+
assert_eq!(bitmaps.len(), 1);
214+
assert!(bitmaps[0].contains(0));
215+
}
216+
217+
#[test]
218+
fn test_process_bitmap_no_match() {
219+
let query_strings = vec![String::from("CAT=Electronics")];
220+
let metadata_strings = vec![
221+
String::from("CAT=Automotive,RATING=5"),
222+
String::from("CAT=Fashion,RATING=4"),
223+
];
224+
225+
let bitmaps = process_bitmap_for_labels(query_strings, metadata_strings, &POOL);
226+
assert_eq!(bitmaps.len(), 1);
227+
assert!(bitmaps[0].is_empty());
228+
}
229+
230+
#[test]
231+
fn test_process_bitmap_multiple_matches() {
232+
let query_strings = vec![String::from("RATING=5")];
233+
let metadata_strings = vec![
234+
String::from("CAT=Automotive,RATING=5"),
235+
String::from("CAT=Fashion,RATING=4"),
236+
String::from("CAT=Electronics,RATING=5"),
237+
];
238+
239+
let bitmaps = process_bitmap_for_labels(query_strings, metadata_strings, &POOL);
240+
assert_eq!(bitmaps.len(), 1);
241+
assert!(bitmaps[0].contains(0));
242+
assert!(!bitmaps[0].contains(1));
243+
assert!(bitmaps[0].contains(2));
244+
}
182245
}

diskann-tools/src/utils/gen_associated_data_from_range.rs

Lines changed: 77 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,12 +6,12 @@
66
use std::io::Write;
77

88
use diskann_providers::storage::StorageWriteProvider;
9-
use diskann_providers::{storage::FileStorageProvider, utils::write_metadata};
9+
use diskann_providers::utils::write_metadata;
1010

1111
use super::CMDResult;
1212

13-
pub fn gen_associated_data_from_range(
14-
storage_provider: &FileStorageProvider,
13+
pub fn gen_associated_data_from_range<S: StorageWriteProvider>(
14+
storage_provider: &S,
1515
associated_data_path: &str,
1616
start: u32,
1717
end: u32,
@@ -32,3 +32,77 @@ pub fn gen_associated_data_from_range(
3232

3333
Ok(())
3434
}
35+
36+
#[cfg(test)]
37+
mod tests {
38+
use super::*;
39+
use byteorder::{LittleEndian, ReadBytesExt};
40+
use diskann_providers::storage::{StorageReadProvider, VirtualStorageProvider};
41+
42+
#[test]
43+
fn test_gen_associated_data_from_range() {
44+
let storage_provider = VirtualStorageProvider::new_memory();
45+
let path = "/test_gen_associated_data_from_range.bin";
46+
47+
// Generate data from range 0 to 9
48+
gen_associated_data_from_range(&storage_provider, path, 0, 9).unwrap();
49+
50+
// Read back and verify
51+
let mut file = storage_provider.open_reader(path).unwrap();
52+
53+
// Read metadata
54+
let num_ints = file.read_u32::<LittleEndian>().unwrap();
55+
let int_length = file.read_u32::<LittleEndian>().unwrap();
56+
57+
assert_eq!(num_ints, 10);
58+
assert_eq!(int_length, 1);
59+
60+
// Read integers
61+
for expected in 0u32..=9 {
62+
let actual = file.read_u32::<LittleEndian>().unwrap();
63+
assert_eq!(actual, expected);
64+
}
65+
}
66+
67+
#[test]
68+
fn test_gen_associated_data_from_range_single_value() {
69+
let storage_provider = VirtualStorageProvider::new_memory();
70+
let path = "/test_gen_associated_data_single.bin";
71+
72+
// Generate data for a single value
73+
gen_associated_data_from_range(&storage_provider, path, 42, 42).unwrap();
74+
75+
let mut file = storage_provider.open_reader(path).unwrap();
76+
77+
let num_ints = file.read_u32::<LittleEndian>().unwrap();
78+
let int_length = file.read_u32::<LittleEndian>().unwrap();
79+
80+
assert_eq!(num_ints, 1);
81+
assert_eq!(int_length, 1);
82+
83+
let value = file.read_u32::<LittleEndian>().unwrap();
84+
assert_eq!(value, 42);
85+
}
86+
87+
#[test]
88+
fn test_gen_associated_data_from_range_large() {
89+
let storage_provider = VirtualStorageProvider::new_memory();
90+
let path = "/test_gen_associated_data_large.bin";
91+
92+
// Generate data for range 100 to 199
93+
gen_associated_data_from_range(&storage_provider, path, 100, 199).unwrap();
94+
95+
let mut file = storage_provider.open_reader(path).unwrap();
96+
97+
let num_ints = file.read_u32::<LittleEndian>().unwrap();
98+
let int_length = file.read_u32::<LittleEndian>().unwrap();
99+
100+
assert_eq!(num_ints, 100);
101+
assert_eq!(int_length, 1);
102+
103+
for expected in 100u32..=199 {
104+
let actual = file.read_u32::<LittleEndian>().unwrap();
105+
assert_eq!(actual, expected);
106+
}
107+
}
108+
}

0 commit comments

Comments
 (0)