Commit 81fab9c
[quantization] 8bit distance kernels and ZipUnzip (#798)
This PR introduces heterogeneous inner-product kernels for 8-bit
bitslices; specifically with 4-bit, 2-bit and 1-bit bitslices.
The goal is to enable fast kernels for full-precision like queries with
quantized vectors (spherical, minmax etc.). In the benchmark, we see the
`u8xu4` kernel is ~2x faster than its `f32xu4` counterpart.
For AVX2 capable architectures, the 4-bit and 2-bit kernels are
implemented using the
[`_mm256_maddubs_epi16`](https://doc.rust-lang.org/beta/core/arch/x86_64/fn._mm256_maddubs_epi16.html)
intrinsic acting on blocks of 32 byte-sized dimensions for the `u8xu4`
kernel and 64 dimensions for the `u8xu2` kernel. Some care needed to be
taken to make sure that for these specific kernels, the intrinsic
doesn't saturate when doing the madds. For the 1-bit kernel, we
implement a simple masked horizontal add strategy on blocks of size 32.
`Scalar` fallback is implemented for `Neon` and for now `V4`
architecture gets retargeted to `V3` for these kernels.
Support to compute `u8xu4`, `u8xu2` and `u8xu1` distances with minmax
quantized vectors is available mostly out of the box.
## ZipUnzip
A new trait `ZipUnzip` has been added to diskann-wide to implement
vectorized zipping and unzipping logic - the zipping merges two halved
vectors into a full vector by interleaving elements from each half
vector, and, the unzipping performs the inverse transformation on the
full vector.
- It's currently implemented for `i8x32`, `i16x16`, `i32x8`, `u8x32`,
`u32x8` and `f16x16`.
- It's implemented for `Scalar`, `V3`, `V4` and `Neon` architectures.
# Benchmark
We ran the benchmark as a flat scan of vectors, making sure to clear the
cache at every run and on a count that exceeds the L3 cache size for the
machine.
```
Total latency in ms, COUNT=150K, AMD EPYC 7763
Kernel dim=256 dim=384 dim=896
─────────────────────────────────────────────────────────────
u8×u4 (new) 8.92 9.93 17.71
u8×u2 (new) 9.87 13.10 21.53
u8×u1 (new) 6.09 7.52 14.62
f32×u4 15.96 20.99 40.67
f32×u2 13.88 18.46 35.64
f32×u1 13.53 17.93 34.65
u8×u8 8.40 10.00 19.72
u4×u4 7.33 11.58 15.49
f32×f32 16.56 23.16 47.76
```
---------
Co-authored-by: Mark Hildebrand <hildebrandmw@gmail.com>
Co-authored-by: Mark Hildebrand <mhildebrand@microsoft.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>1 parent 3cd8ac2 commit 81fab9c
26 files changed
Lines changed: 1514 additions & 107 deletions
File tree
- diskann-quantization/src
- bits
- minmax
- diskann-wide/src
- arch
- aarch64
- x86_64
- v3
- v4
- test_utils
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
203 | 203 | | |
204 | 204 | | |
205 | 205 | | |
206 | | - | |
207 | | - | |
208 | | - | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
209 | 209 | | |
210 | 210 | | |
211 | 211 | | |
212 | | - | |
| 212 | + | |
213 | 213 | | |
214 | | - | |
215 | | - | |
| 214 | + | |
| 215 | + | |
216 | 216 | | |
217 | 217 | | |
218 | 218 | | |
| |||
477 | 477 | | |
478 | 478 | | |
479 | 479 | | |
| 480 | + | |
| 481 | + | |
| 482 | + | |
| 483 | + | |
| 484 | + | |
| 485 | + | |
| 486 | + | |
| 487 | + | |
| 488 | + | |
| 489 | + | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
| 495 | + | |
| 496 | + | |
| 497 | + | |
| 498 | + | |
| 499 | + | |
| 500 | + | |
| 501 | + | |
| 502 | + | |
| 503 | + | |
| 504 | + | |
| 505 | + | |
| 506 | + | |
| 507 | + | |
| 508 | + | |
| 509 | + | |
| 510 | + | |
| 511 | + | |
| 512 | + | |
| 513 | + | |
| 514 | + | |
| 515 | + | |
| 516 | + | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
480 | 524 | | |
481 | 525 | | |
482 | 526 | | |
| |||
494 | 538 | | |
495 | 539 | | |
496 | 540 | | |
497 | | - | |
498 | | - | |
499 | | - | |
500 | | - | |
501 | | - | |
502 | | - | |
503 | | - | |
504 | | - | |
505 | | - | |
506 | | - | |
507 | | - | |
508 | | - | |
509 | | - | |
510 | | - | |
511 | | - | |
512 | | - | |
513 | | - | |
514 | | - | |
515 | | - | |
516 | | - | |
517 | | - | |
518 | | - | |
519 | | - | |
520 | | - | |
521 | | - | |
522 | | - | |
523 | | - | |
524 | | - | |
525 | | - | |
526 | | - | |
527 | | - | |
528 | | - | |
529 | | - | |
530 | | - | |
531 | | - | |
532 | | - | |
533 | | - | |
534 | | - | |
535 | | - | |
536 | | - | |
537 | | - | |
538 | | - | |
539 | | - | |
540 | | - | |
541 | | - | |
542 | | - | |
543 | | - | |
544 | | - | |
545 | | - | |
546 | | - | |
547 | | - | |
548 | | - | |
549 | | - | |
550 | | - | |
551 | | - | |
552 | | - | |
| 541 | + | |
| 542 | + | |
553 | 543 | | |
554 | | - | |
555 | | - | |
556 | | - | |
557 | | - | |
558 | | - | |
559 | | - | |
560 | | - | |
| 544 | + | |
| 545 | + | |
561 | 546 | | |
562 | 547 | | |
563 | 548 | | |
| |||
741 | 726 | | |
742 | 727 | | |
743 | 728 | | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
744 | 783 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| 6 | + | |
| 7 | + | |
6 | 8 | | |
7 | 9 | | |
8 | 10 | | |
| |||
75 | 77 | | |
76 | 78 | | |
77 | 79 | | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
78 | 91 | | |
79 | 92 | | |
80 | 93 | | |
| |||
230 | 243 | | |
231 | 244 | | |
232 | 245 | | |
| 246 | + | |
| 247 | + | |
233 | 248 | | |
234 | 249 | | |
235 | 250 | | |
| |||
238 | 253 | | |
239 | 254 | | |
240 | 255 | | |
| 256 | + | |
241 | 257 | | |
242 | 258 | | |
243 | 259 | | |
| |||
250 | 266 | | |
251 | 267 | | |
252 | 268 | | |
| 269 | + | |
| 270 | + | |
253 | 271 | | |
254 | 272 | | |
255 | 273 | | |
| |||
261 | 279 | | |
262 | 280 | | |
263 | 281 | | |
| 282 | + | |
264 | 283 | | |
265 | 284 | | |
266 | 285 | | |
| |||
270 | 289 | | |
271 | 290 | | |
272 | 291 | | |
| 292 | + | |
273 | 293 | | |
274 | 294 | | |
275 | 295 | | |
| |||
280 | 300 | | |
281 | 301 | | |
282 | 302 | | |
| 303 | + | |
| 304 | + | |
283 | 305 | | |
284 | 306 | | |
285 | 307 | | |
| |||
289 | 311 | | |
290 | 312 | | |
291 | 313 | | |
| 314 | + | |
292 | 315 | | |
293 | 316 | | |
294 | 317 | | |
| |||
299 | 322 | | |
300 | 323 | | |
301 | 324 | | |
| 325 | + | |
| 326 | + | |
302 | 327 | | |
303 | 328 | | |
304 | 329 | | |
| |||
308 | 333 | | |
309 | 334 | | |
310 | 335 | | |
| 336 | + | |
311 | 337 | | |
312 | 338 | | |
313 | 339 | | |
| |||
340 | 366 | | |
341 | 367 | | |
342 | 368 | | |
| 369 | + | |
| 370 | + | |
343 | 371 | | |
344 | 372 | | |
345 | 373 | | |
| |||
371 | 399 | | |
372 | 400 | | |
373 | 401 | | |
| 402 | + | |
374 | 403 | | |
375 | 404 | | |
376 | 405 | | |
| |||
388 | 417 | | |
389 | 418 | | |
390 | 419 | | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
| 423 | + | |
391 | 424 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
569 | 569 | | |
570 | 570 | | |
571 | 571 | | |
| 572 | + | |
| 573 | + | |
| 574 | + | |
| 575 | + | |
| 576 | + | |
| 577 | + | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
| 582 | + | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
| 588 | + | |
| 589 | + | |
| 590 | + | |
| 591 | + | |
| 592 | + | |
| 593 | + | |
| 594 | + | |
| 595 | + | |
| 596 | + | |
| 597 | + | |
| 598 | + | |
| 599 | + | |
| 600 | + | |
| 601 | + | |
| 602 | + | |
| 603 | + | |
| 604 | + | |
| 605 | + | |
| 606 | + | |
| 607 | + | |
| 608 | + | |
| 609 | + | |
| 610 | + | |
| 611 | + | |
| 612 | + | |
| 613 | + | |
| 614 | + | |
| 615 | + | |
| 616 | + | |
| 617 | + | |
| 618 | + | |
| 619 | + | |
0 commit comments