feat(dataset): add FP16/BF16 input vector support#1731
Hidden character warning
feat(dataset): add FP16/BF16 input vector support#1731LHT129 wants to merge 1 commit intoantgroup:mainfrom
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the VSAG library by integrating native support for FP16 and BF16 input vector formats. This allows for more memory-efficient storage and processing of vector data, particularly beneficial for large-scale applications. The changes span across core components, including the Dataset API, data type definitions, and the quantization pipeline, ensuring seamless handling of half-precision vectors from input to index processing. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
The pull request introduces support for FP16 and BF16 input vector formats by adding new constants, extending the Dataset API to handle uint16_t vectors, and updating the QuantizerAdapter to process these new data types, including conversions between uint16_t (FP16/BF16) and float. The review highlights a critical bug in the QuantizerAdapter's DecodeOneImpl (and implicitly DecodeBatchImpl) where the conversion from float back to uint16_t for FP16/BF16 is incorrect, potentially causing data corruption, and suggests using dedicated conversion functions. Additionally, it recommends refactoring the duplicated uint16_t to float conversion logic across multiple QuantizerAdapter methods into a shared helper function to improve maintainability.
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | ||
| Vector<DataType> vec(this->dim_, this->allocator_); | ||
| if (!this->inner_quantizer_->DecodeOneImpl(codes, vec.data())) { | ||
| return false; | ||
| } | ||
| for (int64_t i = 0; i < this->dim_; i++) { | ||
| reinterpret_cast<DataT*>(data)[i] = static_cast<DataT>(std::round(vec[i])); | ||
| } | ||
| return true; |
There was a problem hiding this comment.
The conversion from the decoded float vector back to uint16_t (for FP16/BF16) is incorrect. static_cast<DataT>(std::round(vec[i])) does not correctly convert a float to the FP16 or BF16 bit-level representation. This will result in corrupted data upon decoding. You should use the FloatToFP16() and FloatToBF16() utility functions based on the data_type_ member to perform the correct conversion.
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | |
| Vector<DataType> vec(this->dim_, this->allocator_); | |
| if (!this->inner_quantizer_->DecodeOneImpl(codes, vec.data())) { | |
| return false; | |
| } | |
| for (int64_t i = 0; i < this->dim_; i++) { | |
| reinterpret_cast<DataT*>(data)[i] = static_cast<DataT>(std::round(vec[i])); | |
| } | |
| return true; | |
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | |
| Vector<DataType> vec(this->dim_, this->allocator_); | |
| if (!this->inner_quantizer_->DecodeOneImpl(codes, vec.data())) { | |
| return false; | |
| } | |
| auto* out_data = reinterpret_cast<uint16_t*>(data); | |
| for (int64_t i = 0; i < this->dim_; i++) { | |
| if (data_type_ == DataTypes::DATA_TYPE_FP16) { | |
| out_data[i] = FloatToFP16(vec[i]); | |
| } else { | |
| out_data[i] = FloatToBF16(vec[i]); | |
| } | |
| } | |
| return true; |
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | ||
| const auto* data_fp16 = reinterpret_cast<const uint16_t*>(data); | ||
| Vector<DataType> vec(this->dim_ * count, this->allocator_); | ||
| for (int64_t i = 0; i < this->dim_ * count; ++i) { | ||
| if (data_type_ == DataTypes::DATA_TYPE_FP16) { | ||
| vec[i] = FP16ToFloat(data_fp16[i]); | ||
| } else { | ||
| vec[i] = BF16ToFloat(data_fp16[i]); | ||
| } | ||
| } | ||
| return this->inner_quantizer_->TrainImpl(vec.data(), count); |
There was a problem hiding this comment.
The logic to convert uint16_t data (representing FP16/BF16) to float is duplicated in TrainImpl, EncodeOneImpl, EncodeBatchImpl, and ProcessQueryImpl. To improve maintainability and reduce redundancy, consider extracting this conversion logic into a private helper function. For example, you could create a function void convert_from_half(const uint16_t* src, float* dst, int64_t size) that encapsulates this loop.
There was a problem hiding this comment.
Pull request overview
Adds support for ingesting FP16/BF16 (stored as uint16_t) vectors into VSAG datasets and quantization flow, avoiding app-level conversion to FP32.
Changes:
- Extends dataset APIs and internal storage to carry FP16/BF16 vectors via
uint16_t*. - Adds BF16 datatype plumbing (
DATATYPE_BFLOAT16, enum value, parsing andToString). - Extends
QuantizerAdapterto acceptuint16_tinputs and convert FP16/BF16 → FP32 for inner quantizer operations.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/quantization/quantizer_adapter.h | Allows uint16_t as adapter input type and stores runtime data_type_. |
| src/quantization/quantizer_adapter.cpp | Implements FP16/BF16 → FP32 conversion for train/encode/query processing and instantiates adapter for uint16_t. |
| src/index_common_param.cpp | Parses "float16" / "bfloat16" dtype strings and updates validation error list. |
| src/dataset_impl.h | Adds uint16_t* to variant and implements Float16Vectors getters/setters. |
| src/dataset_impl.cpp | Adds deep copy / append / destructor handling for float16 vectors. |
| src/data_type.h | Adds DATA_TYPE_BF16 and ToString mapping. |
| src/constants.cpp | Defines FLOAT16_VECTORS and DATATYPE_BFLOAT16. |
| include/vsag/dataset.h | Adds public Dataset API for Float16Vectors / GetFloat16Vectors. |
| include/vsag/constants.h | Declares new constants for BF16 and float16 vectors key. |
| plan.md | Implementation plan document added. |
| TASK.md | Adds a local absolute path reference (likely accidental). |
Comments suppressed due to low confidence (1)
TASK.md:1
- This commits a machine-local absolute path into the repository, which can leak internal environment details and is not usable for other developers/CI. Recommended: remove
TASK.mdfrom the PR, or replace it with a repo-relative link (or an issue/PR reference) that works across environments.
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | ||
| Vector<DataType> vec(this->dim_, this->allocator_); | ||
| if (!this->inner_quantizer_->DecodeOneImpl(codes, vec.data())) { | ||
| return false; | ||
| } | ||
| for (int64_t i = 0; i < this->dim_; i++) { | ||
| reinterpret_cast<DataT*>(data)[i] = static_cast<DataT>(std::round(vec[i])); | ||
| } | ||
| return true; |
There was a problem hiding this comment.
For the uint16_t path, DecodeOneImpl/DecodeBatchImpl currently rounds floats and casts to uint16_t, which produces integer magnitudes (0–65535) rather than FP16/BF16 bit patterns. If these APIs are intended to return FP16/BF16 vectors as uint16_t storage, this needs float→FP16 / float→BF16 conversion (selected by data_type_). If decode is not meant to support FP16/BF16 output, consider disallowing Decode* for uint16_t (return false / throw) to avoid silently returning invalid data.
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | ||
| Vector<DataType> vec(this->dim_ * count, this->allocator_); | ||
| if (!this->inner_quantizer_->DecodeBatchImpl(codes, vec.data(), count)) { | ||
| return false; | ||
| } | ||
| for (int64_t i = 0; i < this->dim_ * count; i++) { | ||
| reinterpret_cast<DataT*>(data)[i] = static_cast<DataT>(std::round(vec[i])); | ||
| } | ||
| return true; |
There was a problem hiding this comment.
For the uint16_t path, DecodeOneImpl/DecodeBatchImpl currently rounds floats and casts to uint16_t, which produces integer magnitudes (0–65535) rather than FP16/BF16 bit patterns. If these APIs are intended to return FP16/BF16 vectors as uint16_t storage, this needs float→FP16 / float→BF16 conversion (selected by data_type_). If decode is not meant to support FP16/BF16 output, consider disallowing Decode* for uint16_t (return false / throw) to avoid silently returning invalid data.
| } else if constexpr (std::is_same_v<DataT, uint16_t>) { | ||
| const auto* query_fp16 = reinterpret_cast<const uint16_t*>(query); | ||
| Vector<DataType> vec(this->dim_, this->allocator_); | ||
| for (int64_t i = 0; i < this->dim_; i++) { | ||
| if (data_type_ == DataTypes::DATA_TYPE_FP16) { | ||
| vec[i] = FP16ToFloat(query_fp16[i]); | ||
| } else { | ||
| vec[i] = BF16ToFloat(query_fp16[i]); | ||
| } | ||
| } | ||
| auto& inner_computer = reinterpret_cast<Computer<QuantT>&>(computer); | ||
| this->inner_quantizer_->ProcessQueryImpl(vec.data(), inner_computer); |
There was a problem hiding this comment.
PR description states query vectors are always FP32, but the uint16_t specialization reinterprets query as uint16_t* and converts it as FP16/BF16. If queries are indeed FP32, this will misinterpret FP32 memory and produce incorrect results/UB. Suggested fix: for the uint16_t adapter, keep ProcessQueryImpl consuming FP32 (forward query directly to inner_quantizer_ without reinterpret/copy), and only convert for base vectors (train/encode). If the intent is to support FP16/BF16 queries as well, the PR description should be updated and the API should make the query dtype explicit.
| if (data_type_ == DataTypes::DATA_TYPE_FP16) { | ||
| vec[i] = FP16ToFloat(data_fp16[i]); | ||
| } else { | ||
| vec[i] = BF16ToFloat(data_fp16[i]); | ||
| } |
There was a problem hiding this comment.
In the uint16_t conversion path, the else branch treats any non-FP16 data_type_ as BF16. If data_type_ is accidentally DATA_TYPE_FLOAT, DATA_TYPE_INT8, etc., this will silently decode the payload as BF16. Consider making the branch explicit (if FP16 ... else if BF16 ... else error) to fail fast and avoid hard-to-debug incorrect indexing.
| fmt::format("parameters[{}] must in [{}, {}, {}, {}, {}], now is {}", | ||
| PARAMETER_DTYPE, | ||
| DATATYPE_FLOAT32, | ||
| DATATYPE_FLOAT16, | ||
| DATATYPE_BFLOAT16, | ||
| DATATYPE_INT8, | ||
| DATATYPE_SPARSE, | ||
| datatype)); |
There was a problem hiding this comment.
The validation message is grammatically incorrect/unclear: "must in" → "must be in" (or "must be one of"). Updating this improves usability when users pass an unsupported dtype.
| * @brief Sets the float16/bfloat16 vector array for the dataset. | ||
| * | ||
| * @param vectors Pointer to the array of float16/bfloat16 vectors (uint16_t*). | ||
| * @return DatasetPtr A shared pointer to the dataset with updated vectors. | ||
| */ | ||
| virtual DatasetPtr | ||
| Float16Vectors(const uint16_t* vectors) = 0; | ||
|
|
||
| /** | ||
| * @brief Retrieves the float16/bfloat16 vector array of the dataset. | ||
| * | ||
| * @return const uint16_t* Pointer to the array of float16/bfloat16 vectors. |
There was a problem hiding this comment.
The API name Float16Vectors suggests FP16 only, but the comment says it also carries BF16. Since this is new public API, consider either (a) renaming to a format-neutral name (e.g., HalfVectors/Uint16Vectors) or (b) explicitly documenting that these are raw FP16 or BF16 bit-patterns and that the interpretation is controlled by dtype/DataTypes elsewhere. Without that clarification, consumers may pass BF16 data but assume FP16 semantics (or vice versa).
| * @brief Sets the float16/bfloat16 vector array for the dataset. | |
| * | |
| * @param vectors Pointer to the array of float16/bfloat16 vectors (uint16_t*). | |
| * @return DatasetPtr A shared pointer to the dataset with updated vectors. | |
| */ | |
| virtual DatasetPtr | |
| Float16Vectors(const uint16_t* vectors) = 0; | |
| /** | |
| * @brief Retrieves the float16/bfloat16 vector array of the dataset. | |
| * | |
| * @return const uint16_t* Pointer to the array of float16/bfloat16 vectors. | |
| * @brief Sets the 16-bit floating-point vector array for the dataset. | |
| * | |
| * @param vectors Pointer to the array of 16-bit floating-point encodings (uint16_t*). | |
| * | |
| * @details | |
| * The memory pointed to by @p vectors is treated as an array of raw 16-bit | |
| * floating-point bit patterns (either IEEE 754 half-precision FP16 or | |
| * bfloat16 BF16). This method does not itself distinguish between FP16 | |
| * and BF16; the actual interpretation of these values is controlled by | |
| * the dataset's data-type configuration (for example via a dtype/DataTypes | |
| * setting elsewhere in the API). | |
| * | |
| * @return DatasetPtr A shared pointer to the dataset with updated vectors. | |
| */ | |
| virtual DatasetPtr | |
| Float16Vectors(const uint16_t* vectors) = 0; | |
| /** | |
| * @brief Retrieves the 16-bit floating-point vector array of the dataset. | |
| * | |
| * @return const uint16_t* Pointer to the array of raw 16-bit floating-point | |
| * bit patterns (FP16 or BF16), whose interpretation is determined by the | |
| * dataset's data-type configuration (e.g., dtype/DataTypes). |
1b45efd to
40b229c
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 7 comments.
Comments suppressed due to low confidence (1)
TASK.md:1
- This file hard-codes an absolute local filesystem path that won’t exist for other developers/CI and is not useful as repository content. Suggest removing
TASK.mdfrom the PR (or replacing it with a repo-relative link / brief task description).
| CreateFP16Dataset(int64_t dim, int64_t count, const std::vector<float>& fp32_data) { | ||
| std::vector<uint16_t> fp16_data(dim * count); | ||
| for (int64_t i = 0; i < dim * count; ++i) { | ||
| fp16_data[i] = generic::FloatToFP16(fp32_data[i]); | ||
| } |
There was a problem hiding this comment.
CreateFP16Dataset (and the BF16 equivalent) returns a Dataset that points at fp16_data owned by a local std::vector. With Owner(false), the dataset will retain a dangling pointer after the function returns, leading to use-after-free during Build()/Add(). Fix: ensure the vector memory outlives the dataset (e.g., allocate buffers on heap and set Owner(true) appropriately, or return a struct that holds both the dataset and backing storage).
| auto dataset = Dataset::Make(); | ||
| dataset->NumElements(count) | ||
| ->Dim(dim) | ||
| ->Ids(ids.data()) | ||
| ->Float16Vectors(fp16_data.data()) | ||
| ->Owner(false); | ||
| return dataset; |
There was a problem hiding this comment.
CreateFP16Dataset (and the BF16 equivalent) returns a Dataset that points at fp16_data owned by a local std::vector. With Owner(false), the dataset will retain a dangling pointer after the function returns, leading to use-after-free during Build()/Add(). Fix: ensure the vector memory outlives the dataset (e.g., allocate buffers on heap and set Owner(true) appropriately, or return a struct that holds both the dataset and backing storage).
| vsag::Vector<uint16_t> sampled_fp16_buffer(sample_count * dim, safe_allocator); | ||
| for (int64_t i = 0; i < sample_count; ++i) { | ||
| std::copy(original_fp16_data + sampled_indices[i] * dim, | ||
| original_fp16_data + (sampled_indices[i] + 1) * dim, | ||
| sampled_fp16_buffer.data() + i * dim); | ||
| } | ||
| auto* new_data_buffer = | ||
| static_cast<uint16_t*>(safe_allocator->Allocate(sample_count * dim * sizeof(uint16_t))); | ||
| std::copy(sampled_fp16_buffer.begin(), sampled_fp16_buffer.end(), new_data_buffer); |
There was a problem hiding this comment.
This does an extra full-buffer copy: first into sampled_fp16_buffer, then into new_data_buffer. You can copy directly into new_data_buffer inside the loop, avoiding an additional allocation and memory pass (same applies to the FP32 branch below). This matters because sampling can run on large training sets.
| vsag::Vector<uint16_t> sampled_fp16_buffer(sample_count * dim, safe_allocator); | |
| for (int64_t i = 0; i < sample_count; ++i) { | |
| std::copy(original_fp16_data + sampled_indices[i] * dim, | |
| original_fp16_data + (sampled_indices[i] + 1) * dim, | |
| sampled_fp16_buffer.data() + i * dim); | |
| } | |
| auto* new_data_buffer = | |
| static_cast<uint16_t*>(safe_allocator->Allocate(sample_count * dim * sizeof(uint16_t))); | |
| std::copy(sampled_fp16_buffer.begin(), sampled_fp16_buffer.end(), new_data_buffer); | |
| auto* new_data_buffer = | |
| static_cast<uint16_t*>(safe_allocator->Allocate(sample_count * dim * sizeof(uint16_t))); | |
| for (int64_t i = 0; i < sample_count; ++i) { | |
| std::copy(original_fp16_data + sampled_indices[i] * dim, | |
| original_fp16_data + (sampled_indices[i] + 1) * dim, | |
| new_data_buffer + i * dim); | |
| } |
| // limitations under the License. | ||
|
|
||
| #include <catch2/catch_test_macros.hpp> | ||
| #include <cstdint> |
There was a problem hiding this comment.
This uses std::abs on floating-point values but doesn’t include <cmath>. Depending on platform/standard library, this can fail to compile or pick an unintended overload. Add #include <cmath>.
| #include <cstdint> | |
| #include <cstdint> | |
| #include <cmath> |
|
|
||
| uint16_t fp16 = generic::FloatToFP16(test_val); | ||
| float back_fp16 = generic::FP16ToFloat(fp16); | ||
| REQUIRE(std::abs(back_fp16 - test_val) < 0.01f); |
There was a problem hiding this comment.
This uses std::abs on floating-point values but doesn’t include <cmath>. Depending on platform/standard library, this can fail to compile or pick an unintended overload. Add #include <cmath>.
|
|
||
| uint16_t bf16 = generic::FloatToBF16(test_val); | ||
| float back_bf16 = generic::BF16ToFloat(bf16); | ||
| REQUIRE(std::abs(back_bf16 - test_val) < 0.01f); |
There was a problem hiding this comment.
This uses std::abs on floating-point values but doesn’t include <cmath>. Depending on platform/standard library, this can fail to compile or pick an unintended overload. Add #include <cmath>.
| } else { | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, | ||
| fmt::format("parameters[{}] must in [{}, {}, {}], now is {}", | ||
| fmt::format("parameters[{}] must in [{}, {}, {}, {}, {}], now is {}", |
There was a problem hiding this comment.
The new validation message is grammatically unclear ('must in'). Consider changing it to something like 'must be one of' to make the error easier to understand for API consumers.
The DecodeOneImpl and DecodeBatchImpl for uint16_t path incorrectly used static_cast<uint16_t>(std::round(float)), which produces integer magnitudes (0-65535) instead of FP16/BF16 bit patterns. Fixed by using FloatToFP16/FloatToBF16 for proper conversion based on data_type_. Addresses review feedback from gemini-code-assist and Copilot on PR antgroup#1731. Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: Kimi-K2.5 <assistant@example.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 9 comments.
Comments suppressed due to low confidence (1)
TASK.md:1
- This file (and similar content in
plan.md) embeds an absolute local filesystem path that likely won’t exist for other developers and may leak internal environment details. Suggested fix: removeTASK.mdfrom the repo (or replace it with a relative link / issue reference), and avoid committing machine-specific absolute paths in documentation.
| FP16Quantizer<metric>::EncodeOneImpl(const DataType* data, uint8_t* codes) const { | ||
| auto* codes_fp16 = reinterpret_cast<uint16_t*>(codes); | ||
|
|
||
| if (input_data_type_ == DataTypes::DATA_TYPE_FP16) { | ||
| const auto* input_fp16 = reinterpret_cast<const uint16_t*>(data); | ||
| std::memcpy(codes_fp16, input_fp16, this->dim_ * sizeof(uint16_t)); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The FP16 fast-path copies raw FP16 inputs directly into codes, bypassing cosine normalization entirely (and any other preprocessing in the float path). This will produce incorrect results for METRIC_TYPE_COSINE, where the encoded representation must reflect a normalized vector. Suggested fix: only take the memcpy fast-path when metric != METRIC_TYPE_COSINE (and when the input is known to already be in the expected encoded space), otherwise convert FP16→FP32, normalize if needed, then encode.
| const DataType* cur = data; | ||
| Vector<float> tmp(this->allocator_); | ||
| if constexpr (metric == MetricType::METRIC_TYPE_COSINE) { | ||
| tmp.resize(this->dim_); | ||
| Normalize(data, tmp.data(), this->dim_); | ||
| cur = tmp.data(); | ||
| } |
There was a problem hiding this comment.
When input_data_type_ is not FP16, this code assumes data points to FP32 (DataType) and will normalize/dereference it as such. If the dataset dtype is BF16 but the quantization type is configured as FP16 (i.e., FP16Quantizer constructed with common_param.data_type_ == DATA_TYPE_BF16), data will actually point to uint16_t storage and this path will read invalid floats. Suggested fix: explicitly handle DATA_TYPE_BF16 here by reinterpreting data as uint16_t*, converting BF16→FP32 into a temp buffer (and normalizing if cosine), then encoding to FP16 codes.
| BF16Quantizer<metric>::EncodeOneImpl(const DataType* data, uint8_t* codes) const { | ||
| auto* codes_bf16 = reinterpret_cast<uint16_t*>(codes); | ||
|
|
||
| if (input_data_type_ == DataTypes::DATA_TYPE_BF16) { | ||
| const auto* input_bf16 = reinterpret_cast<const uint16_t*>(data); | ||
| std::memcpy(codes_bf16, input_bf16, this->dim_ * sizeof(uint16_t)); | ||
| return true; | ||
| } |
There was a problem hiding this comment.
Same issue as FP16Quantizer: this BF16 memcpy fast-path bypasses cosine normalization, which will yield incorrect encoded vectors for METRIC_TYPE_COSINE. Suggested fix: gate the memcpy optimization to non-cosine metrics (or only when inputs are guaranteed normalized), and otherwise convert to FP32, normalize if needed, then encode to BF16.
| // Use original data format for bucket quantizer | ||
| void* data_ptr = nullptr; | ||
| uint64_t data_size = 0; | ||
| get_vectors(this->data_type_, dim, train_data, &data_ptr, &data_size); | ||
| this->bucket_->Train(reinterpret_cast<const float*>(data_ptr), sample_count); |
There was a problem hiding this comment.
For FP16/BF16 datasets get_vectors() returns a uint16_t* buffer, but this code passes it to bucket_->Train() by reinterpreting it as const float*. Unless bucket_->Train() is explicitly designed to treat the pointer as untyped storage (and never dereference as float), this will break training (wrong reads / UB). Suggested fix: either (1) pass a real FP32 buffer (e.g., the fp32_train_data already created above) to bucket_->Train(), or (2) update the bucket quantizer interface to accept (const void* data, DataTypes dtype, ...) (or (const uint8_t* data, size_t stride, ...)) so FP16/BF16 can be handled without type-punning.
| // Use original data format for bucket quantizer | |
| void* data_ptr = nullptr; | |
| uint64_t data_size = 0; | |
| get_vectors(this->data_type_, dim, train_data, &data_ptr, &data_size); | |
| this->bucket_->Train(reinterpret_cast<const float*>(data_ptr), sample_count); | |
| // Use FP32 data for bucket quantizer training to avoid type-punning issues | |
| this->bucket_->Train(fp32_train_data.data(), sample_count); |
|
|
||
| #include <catch2/catch_test_macros.hpp> | ||
| #include <cstdint> | ||
| #include <random> |
There was a problem hiding this comment.
This new test file uses std::vector, std::copy, and std::abs but does not include the corresponding standard headers (<vector>, <algorithm>, <cmath>). Relying on transitive includes can cause non-deterministic build failures across toolchains; please add the direct includes.
| #include <random> | |
| #include <random> | |
| #include <vector> | |
| #include <algorithm> | |
| #include <cmath> |
| *vectors_ptr = (void*)base->GetFloat16Vectors(); | ||
| *data_size_ptr = dim * sizeof(uint16_t); | ||
| } else { | ||
| throw std::invalid_argument(fmt::format("no support for this metric: {}", (int)type)); |
There was a problem hiding this comment.
This error is thrown from get_vectors(DataTypes type, ...), but the message says "metric" instead of "type". Updating it to "no support for this type" (and ideally printing ToString(type) if available) would make debugging dtype issues clearer.
| throw std::invalid_argument(fmt::format("no support for this metric: {}", (int)type)); | |
| throw std::invalid_argument(fmt::format("no support for this type: {}", (int)type)); |
| : FP16Quantizer<metric>(common_param.dim_, common_param.allocator_.get()) { | ||
| this->input_data_type_ = common_param.data_type_; | ||
| }; |
There was a problem hiding this comment.
The trailing semicolon after the constructor definition (};) is extraneous at namespace scope and should be removed to match typical C++ style and avoid confusion.
| : BF16Quantizer<metric>(common_param.dim_, common_param.allocator_.get()) { | ||
| this->input_data_type_ = common_param.data_type_; | ||
| }; |
There was a problem hiding this comment.
Same as FP16Quantizer: the constructor ends with an extraneous trailing semicolon (};). Please remove it for consistency and readability.
| TEST_CASE("HGraph with FP16 Test", "[ft][hgraph][fp16]") { | ||
| constexpr int64_t TEST_DIM = 32; | ||
| constexpr int64_t TEST_COUNT = 100; |
There was a problem hiding this comment.
The new FP16/BF16 functional tests exercise only \"metric_type\": \"l2\", but the PR description claims FP16/BF16 support for L2/IP/Cosine. Consider adding at least one FP16 or BF16 functional test that builds and searches with \"metric_type\": \"cosine\" (and/or \"ip\") to cover the normalization-sensitive paths and prevent regressions.
The DecodeOneImpl and DecodeBatchImpl for uint16_t path incorrectly used static_cast<uint16_t>(std::round(float)), which produces integer magnitudes (0-65535) instead of FP16/BF16 bit patterns. Fixed by using FloatToFP16/FloatToBF16 for proper conversion based on data_type_. Addresses review feedback from gemini-code-assist and Copilot on PR antgroup#1731. Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: Kimi-K2.5 <assistant@example.com>
76c790e to
53814b1
Compare
The DecodeOneImpl and DecodeBatchImpl for uint16_t path incorrectly used static_cast<uint16_t>(std::round(float)), which produces integer magnitudes (0-65535) instead of FP16/BF16 bit patterns. Fixed by using FloatToFP16/FloatToBF16 for proper conversion based on data_type_. Addresses review feedback from gemini-code-assist and Copilot on PR antgroup#1731. Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: Kimi-K2.5 <assistant@example.com>
53814b1 to
3fff8b8
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 25 out of 25 changed files in this pull request and generated 8 comments.
Comments suppressed due to low confidence (1)
TASK.md:1
- This file embeds an absolute local filesystem path (including a username). This is not portable and can leak developer environment details. Prefer removing
TASK.mdfrom source control or replacing it with a repo-relative link/issue reference (e.g.,Fixes #1730) or a short task description.
| // Get original data pointer for bucket insertion | ||
| void* base_vectors = nullptr; | ||
| uint64_t base_data_size = 0; | ||
| get_vectors(this->data_type_, dim_, base, &base_vectors, &base_data_size); | ||
| const auto* raw_vectors = reinterpret_cast<const char*>(base_vectors); | ||
|
|
||
| auto add_func = [&](int64_t i) -> void { | ||
| for (int64_t j = 0; j < buckets_per_data_; ++j) { | ||
| const auto* data_ptr = vectors + i * dim_; | ||
| const auto* data_ptr = | ||
| raw_vectors + i * dim_ * | ||
| ((this->data_type_ == DataTypes::DATA_TYPE_FLOAT) | ||
| ? sizeof(float) | ||
| : sizeof(uint16_t)); | ||
| auto idx = i * buckets_per_data_ + j; | ||
| InnerIdType offset_id = bucket_->InsertVector( | ||
| data_ptr, buckets[idx], idx + current_num * buckets_per_data_); | ||
| InnerIdType offset_id = bucket_->InsertVector(reinterpret_cast<const float*>(data_ptr), | ||
| buckets[idx], | ||
| idx + current_num * buckets_per_data_); |
There was a problem hiding this comment.
Same issue as training: for FP16/BF16 data_ptr is a byte pointer into uint16_t data, but it is passed as const float*. This is prone to misalignment (2-byte aligned data cast to float*) and UB if any code uses it as float before converting. A safer approach is to avoid char* + reinterpret_cast<const float*> and instead compute offsets on the correct element type (const uint16_t* vs const float*) or change InsertVector to accept const void*/const uint8_t* with an element-size/type.
| const auto* fp32_data = data->GetFloat32Vectors(); | ||
| memcpy(result.data(), fp32_data, num_elements * dim * sizeof(float)); | ||
| } |
There was a problem hiding this comment.
memcpy is used in newly added code without an explicit <cstring> include and without std::memcpy. This can become a portability/build issue depending on transitive includes. Include <cstring> in this file (or use an existing project header that guarantees it) and call std::memcpy.
| } catch (const std::bad_alloc& e) { | ||
| throw VsagException(ErrorType::NO_ENOUGH_MEMORY, "bad alloc when init computer buf"); | ||
| } |
There was a problem hiding this comment.
The caught exception variable e is unused and the rethrown VsagException drops the original error detail (e.what()). Either remove the variable name (catch (const std::bad_alloc&)) or include e.what() (or wrap it) in the thrown exception for better diagnostics.
plan.md
Outdated
| - **需求文档**: `/home/tianlan.lht/code/workspace/fp16_and_bf16_support.md` | ||
| - **任务文件**: `/home/tianlan.lht/code/workspace/agent-hive/tasks/2026-03-20-支持-fp16-bf16-输入向量格式.md` |
There was a problem hiding this comment.
The plan document includes absolute local paths. This is not usable for other contributors/CI and may disclose environment details. Replace these with repo-relative paths (if the docs exist in-repo), links to issues/PRs, or remove the local-path section.
| - **需求文档**: `/home/tianlan.lht/code/workspace/fp16_and_bf16_support.md` | |
| - **任务文件**: `/home/tianlan.lht/code/workspace/agent-hive/tasks/2026-03-20-支持-fp16-bf16-输入向量格式.md` | |
| - **需求文档**: 详见项目文档《FP16/BF16 支持需求说明》 | |
| - **任务文件**: `tasks/2026-03-20-支持-fp16-bf16-输入向量格式.md` |
| auto param = R"( | ||
| { | ||
| "dtype": "float16", | ||
| "metric_type": "l2", |
There was a problem hiding this comment.
The PR description claims FP16/BF16 support for L2, IP, and Cosine, but the newly added functional tests only exercise metric_type: \"l2\". Add at least one FP16 and one BF16 functional test covering ip and/or cosine (especially cosine, since it exercises normalization paths in quantizers) to prevent regressions.
| auto param = R"( | ||
| { | ||
| "dtype": "bfloat16", | ||
| "metric_type": "l2", |
There was a problem hiding this comment.
The PR description claims FP16/BF16 support for L2, IP, and Cosine, but the newly added functional tests only exercise metric_type: \"l2\". Add at least one FP16 and one BF16 functional test covering ip and/or cosine (especially cosine, since it exercises normalization paths in quantizers) to prevent regressions.
| auto param = R"( | ||
| { | ||
| "dtype": "float16", | ||
| "metric_type": "l2", |
There was a problem hiding this comment.
The PR description claims FP16/BF16 support for L2, IP, and Cosine, but the newly added functional tests only exercise metric_type: \"l2\". Add at least one FP16 and one BF16 functional test covering ip and/or cosine (especially cosine, since it exercises normalization paths in quantizers) to prevent regressions.
| auto param = R"( | ||
| { | ||
| "dtype": "bfloat16", | ||
| "metric_type": "l2", |
There was a problem hiding this comment.
The PR description claims FP16/BF16 support for L2, IP, and Cosine, but the newly added functional tests only exercise metric_type: \"l2\". Add at least one FP16 and one BF16 functional test covering ip and/or cosine (especially cosine, since it exercises normalization paths in quantizers) to prevent regressions.
- Fix get_query_data() to support INT8 query vectors (was returning nullptr) - Fix FP16/BF16 quantizers to normalize for COSINE metric even with FP16/BF16 input - Add FP16/BF16 data type handling in flatten_interface.cpp to create QuantizerAdapter<ProductQuantizer, uint16_t> Fixes: - Test Example Aarch64 SEGV in FP32ComputeL2Sqr (null query pointer) - Test Aarch64 functests failures - Review comment about COSINE normalization bypass in FP16/BF16 fast-path Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
The DecodeOneImpl and DecodeBatchImpl for uint16_t path incorrectly used static_cast<uint16_t>(std::round(float)), which produces integer magnitudes (0-65535) instead of FP16/BF16 bit patterns. Fixed by using FloatToFP16/FloatToBF16 for proper conversion based on data_type_. Addresses review feedback from gemini-code-assist and Copilot on PR antgroup#1731. Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: Kimi-K2.5 <assistant@example.com>
- Fix get_query_data() to support INT8 query vectors (was returning nullptr) - Fix FP16/BF16 quantizers to normalize for COSINE metric even with FP16/BF16 input - Add FP16/BF16 data type handling in flatten_interface.cpp to create QuantizerAdapter<ProductQuantizer, uint16_t> Fixes: - Test Example Aarch64 SEGV in FP32ComputeL2Sqr (null query pointer) - Test Aarch64 functests failures - Review comment about COSINE normalization bypass in FP16/BF16 fast-path Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
df15224 to
1f5cfee
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 26 out of 26 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
TASK.md:1
- This file contains an absolute path to a local machine, which won’t exist for other developers/CI and may leak internal environment details. Consider removing
TASK.mdfrom the repo or replacing it with a repository-relative link (or an issue/PR reference).
| "QuantizerAdapter::ProcessQueryImpl only supports int8_t data type"); | ||
| } | ||
| auto& inner_computer = reinterpret_cast<Computer<QuantT>&>(computer); | ||
| this->inner_quantizer_->ProcessQueryImpl(query, inner_computer); |
There was a problem hiding this comment.
This change removes the int8→FP32 query conversion that existed previously. For QuantizerAdapter<..., int8_t>, callers may still supply int8 query buffers (e.g., HGraph’s get_query_data() returns GetInt8Vectors() for DATA_TYPE_INT8), which would cause inner_quantizer_->ProcessQueryImpl() to interpret int8 bytes as floats. Fix by keeping passthrough only for FP32 queries, but retaining the conversion path when DataT is int8_t (or otherwise ensuring queries are always FP32 for int8 indexes).
| this->inner_quantizer_->ProcessQueryImpl(query, inner_computer); | |
| if constexpr (std::is_same_v<DataT, int8_t>) { | |
| // For int8 indexes, callers may pass int8 query buffers. Convert to FP32 | |
| // before forwarding to the inner quantizer, which expects float queries. | |
| auto* buf = static_cast<float*>(this->allocator_->Allocate(sizeof(float) * this->dim_)); | |
| const auto* query_i8 = reinterpret_cast<const int8_t*>(query); | |
| for (size_t i = 0; i < this->dim_; ++i) { | |
| buf[i] = static_cast<float>(query_i8[i]); | |
| } | |
| this->inner_quantizer_->ProcessQueryImpl(buf, inner_computer); | |
| this->allocator_->Deallocate(buf); | |
| } else { | |
| // For FP32 (and other supported DataT where query is already correctly typed), | |
| // keep the direct passthrough behavior. | |
| this->inner_quantizer_->ProcessQueryImpl(query, inner_computer); | |
| } |
| memcpy(result.data(), fp32_data, num_elements * dim * sizeof(float)); | ||
| } |
There was a problem hiding this comment.
memcpy is used here but this file’s includes (in the provided diff) don’t show <cstring>. To avoid relying on transitive includes and namespace quirks, explicitly include <cstring> and prefer std::memcpy.
src/algorithm/hgraph.cpp
Outdated
| if (search_param.ep == INVALID_ENTRY_POINT) { | ||
| return make_empty_dataset_with_stats(); | ||
| } | ||
|
|
||
| const auto* raw_query = get_data(query); | ||
| const auto* raw_query = get_query_data(query); |
There was a problem hiding this comment.
The if statement indentation appears to have been lost in this hunk, which reduces readability and is inconsistent with surrounding style. Please run clang-format (or restore indentation) for this block to keep formatting consistent.
| * @param vectors Pointer to the array of float16/bfloat16 vectors (uint16_t*). | ||
| * @return DatasetPtr A shared pointer to the dataset with updated vectors. | ||
| */ | ||
| virtual DatasetPtr | ||
| Float16Vectors(const uint16_t* vectors) = 0; |
There was a problem hiding this comment.
The doc comment says the parameter is uint16_t*, but the API takes const uint16_t*. Update the comment to match the signature (and emphasize that the buffer is read-only unless ownership is transferred via Owner(...)).
The DecodeOneImpl and DecodeBatchImpl for uint16_t path incorrectly used static_cast<uint16_t>(std::round(float)), which produces integer magnitudes (0-65535) instead of FP16/BF16 bit patterns. Fixed by using FloatToFP16/FloatToBF16 for proper conversion based on data_type_. Addresses review feedback from gemini-code-assist and Copilot on PR antgroup#1731. Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: Kimi-K2.5 <assistant@example.com>
- Fix get_query_data() to support INT8 query vectors (was returning nullptr) - Fix FP16/BF16 quantizers to normalize for COSINE metric even with FP16/BF16 input - Add FP16/BF16 data type handling in flatten_interface.cpp to create QuantizerAdapter<ProductQuantizer, uint16_t> Fixes: - Test Example Aarch64 SEGV in FP32ComputeL2Sqr (null query pointer) - Test Aarch64 functests failures - Review comment about COSINE normalization bypass in FP16/BF16 fast-path Signed-off-by: LHT129 <tianlan.lht@antgroup.com>
1f5cfee to
f28f3f1
Compare
f28f3f1 to
17d55a8
Compare
| if (common_param.data_type_ == DataTypes::DATA_TYPE_FP16 || | ||
| common_param.data_type_ == DataTypes::DATA_TYPE_BF16) { | ||
| if (quantization_string == QUANTIZATION_TYPE_VALUE_FP16) { | ||
| return make_instance_flatten<FP16Quantizer<metric>, IOTemp>(param, common_param); | ||
| } | ||
| if (quantization_string == QUANTIZATION_TYPE_VALUE_BF16) { | ||
| return make_instance_flatten<BF16Quantizer<metric>, IOTemp>(param, common_param); | ||
| } | ||
| if (quantization_string == QUANTIZATION_TYPE_VALUE_PQ) { | ||
| return make_instance_flatten<QuantizerAdapter<ProductQuantizer<metric>, uint16_t>, | ||
| IOTemp>(param, common_param); | ||
| } | ||
| throw VsagException(ErrorType::INVALID_ARGUMENT, | ||
| fmt::format("FP16/BF16 data type does not support {} quantization", | ||
| quantization_string)); |
There was a problem hiding this comment.
This logic allows mismatched configurations like dtype=bfloat16 with base_quantization_type=fp16 (and vice versa). In that case FP16Quantizer is constructed with input_data_type_ = BF16, but FP16Quantizer::EncodeOneImpl only treats input as FP16 when input_data_type_ == DATA_TYPE_FP16; otherwise it assumes the input pointer is FP32—leading to incorrect reads of BF16 payload. Fix by either (mandatory) rejecting mismatched dtype vs quantizer type for FP16/BF16 quantizers, or enhancing FP16/BF16 quantizers to correctly decode both DATA_TYPE_FP16 and DATA_TYPE_BF16 inputs before encoding.
| if (common_param.data_type_ == DataTypes::DATA_TYPE_FP16 || | |
| common_param.data_type_ == DataTypes::DATA_TYPE_BF16) { | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_FP16) { | |
| return make_instance_flatten<FP16Quantizer<metric>, IOTemp>(param, common_param); | |
| } | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_BF16) { | |
| return make_instance_flatten<BF16Quantizer<metric>, IOTemp>(param, common_param); | |
| } | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_PQ) { | |
| return make_instance_flatten<QuantizerAdapter<ProductQuantizer<metric>, uint16_t>, | |
| IOTemp>(param, common_param); | |
| } | |
| throw VsagException(ErrorType::INVALID_ARGUMENT, | |
| fmt::format("FP16/BF16 data type does not support {} quantization", | |
| quantization_string)); | |
| if (common_param.data_type_ == DataTypes::DATA_TYPE_FP16) { | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_FP16) { | |
| return make_instance_flatten<FP16Quantizer<metric>, IOTemp>(param, common_param); | |
| } | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_PQ) { | |
| return make_instance_flatten<QuantizerAdapter<ProductQuantizer<metric>, uint16_t>, | |
| IOTemp>(param, common_param); | |
| } | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_BF16) { | |
| throw VsagException( | |
| ErrorType::INVALID_ARGUMENT, | |
| fmt::format("FP16 data type does not support {} quantization", quantization_string)); | |
| } | |
| throw VsagException( | |
| ErrorType::INVALID_ARGUMENT, | |
| fmt::format("FP16 data type does not support {} quantization", quantization_string)); | |
| } | |
| if (common_param.data_type_ == DataTypes::DATA_TYPE_BF16) { | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_BF16) { | |
| return make_instance_flatten<BF16Quantizer<metric>, IOTemp>(param, common_param); | |
| } | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_PQ) { | |
| return make_instance_flatten<QuantizerAdapter<ProductQuantizer<metric>, uint16_t>, | |
| IOTemp>(param, common_param); | |
| } | |
| if (quantization_string == QUANTIZATION_TYPE_VALUE_FP16) { | |
| throw VsagException( | |
| ErrorType::INVALID_ARGUMENT, | |
| fmt::format("BF16 data type does not support {} quantization", quantization_string)); | |
| } | |
| throw VsagException( | |
| ErrorType::INVALID_ARGUMENT, | |
| fmt::format("BF16 data type does not support {} quantization", quantization_string)); |
| const auto* data_ptr = | ||
| raw_vectors + i * dim_ * | ||
| ((this->data_type_ == DataTypes::DATA_TYPE_FLOAT) | ||
| ? sizeof(float) | ||
| : sizeof(uint16_t)); |
There was a problem hiding this comment.
Pointer arithmetic assumes any non-FP32 type uses 16-bit elements, which is incorrect for DATA_TYPE_INT8 (and would compute wrong offsets). Fix by using a switch/lookup for element size based on data_type_ (float=4, int8=1, fp16/bf16=2), or reuse the data_size_ptr/element-size information already computed by get_vectors() to avoid duplicating/incorrectly encoding this mapping.
755e822 to
185762a
Compare
185762a to
3e517c4
Compare
- Add FP16/BF16 data types (DATA_TYPE_FP16, DATA_TYPE_BF16) - Extend Dataset API with Float16Vectors/GetFloat16Vectors - Add FP16Quantizer and BF16Quantizer for half-precision vectors - Support FP16/BF16 in HGraph and IVF indexes - Handle SPARSE data type in get_query_data for proper query processing - Add QuantizerAdapter support for uint16_t data type - Add FP16/BF16 HGraph examples Fixes antgroup#1730 Signed-off-by: LHT129 <tianlan.lht@antgroup.com> Co-authored-by: Kimi-K2.5 <assistant@example.com>
3e517c4 to
1430589
Compare
| inline const void* | ||
| get_data(const DatasetPtr& data, int64_t index) const; |
There was a problem hiding this comment.
The new private method IVF::get_data(...) is declared/defined but not used anywhere in IVF (only the definition exists). This adds dead code and can confuse future maintenance; either use it where you currently compute raw_vectors + ... or remove it until needed.
| inline const void* | |
| get_data(const DatasetPtr& data, int64_t index) const; |
| size_t element_size = sizeof(float); | ||
| if (data_type_ == DataTypes::DATA_TYPE_FP16 || data_type_ == DataTypes::DATA_TYPE_BF16) { | ||
| element_size = sizeof(uint16_t); | ||
| } else if (data_type_ == DataTypes::DATA_TYPE_INT8) { | ||
| element_size = sizeof(int8_t); | ||
| } |
There was a problem hiding this comment.
This introduces size_t element_size in new code. Repo guidance asks to prefer uint64_t over size_t in code changes to avoid potential macOS compile issues (CLAUDE.md:103). Consider switching element_size to uint64_t (or reusing an existing uint64_t size variable) here.
| } else if (type == DataTypes::DATA_TYPE_INT8) { | ||
| base->Int8Vectors((int8_t*)vectors_ptr)->Dim(dim)->Owner(false)->NumElements(num_element); | ||
| } else if (type == DataTypes::DATA_TYPE_FP16 || type == DataTypes::DATA_TYPE_BF16) { | ||
| base->Float16Vectors((uint16_t*)vectors_ptr) |
There was a problem hiding this comment.
set_dataset casts vectors_ptr to uint16_t* when calling Float16Vectors(...), which discards const from the input pointer. Since the Dataset API takes const uint16_t*, prefer a static_cast<const uint16_t*> (or equivalent) to keep const-correctness and avoid accidental writes through this pointer later.
| base->Float16Vectors((uint16_t*)vectors_ptr) | |
| base->Float16Vectors(static_cast<const uint16_t*>(vectors_ptr)) |
Summary
Add support for FP16 and BF16 input vector formats in VSAG library, enabling direct indexing of half-precision vectors without application-level conversion. This reduces memory usage by 50% compared to FP32.
Changes
Infrastructure
DATATYPE_BFLOAT16andFLOAT16_VECTORSconstants inconstants.h/cppDATA_TYPE_BF16enum value andToStringsupport indata_type.hFloat16Vectors/GetFloat16VectorsDataset API (shared for FP16/BF16)DatasetImplvariant to supportuint16_t*data typefloat16/bfloat16dtype parameters inindex_common_param.cppQuantizerAdapterto supportuint16_tdata type with FP16/BF16 conversionHGraph Index Support
get_query_data()function to handle query vectors (always FP32)get_data()support for FP16/BF16 base vectorsbuild_by_odescentto use correct pointer arithmetic for 16-bit dataIVF Index Support
get_data()function to handle FP16/BF16 data retrievalconvert_to_fp32()helper function for partition strategyTrain()andAdd()to convert FP16/BF16 data for partition classificationfactory_router_indexto use FP32 data type for router index creationTechnical Details
uint16_t*(shared for FP16 and BF16)Testing
Related Issues
Checklist