Skip to content

Chebychev distance/Maximum Metric doesn't work correctly with KdTree queries #285

@cbueth

Description

@cbueth

Hey @sdd, thanks for the active development of this great package. I've implemented Chebyshev distance (max(|a_i - b_i|)), but it produces incorrect results with KdTree queries. The within() function misses points at the exact radius boundary. I suppose that is because the distance metric is ill-defined as per the current query logic. It would be great to get some input from you and possibly bring this metric to the package. On my fork, I have added some tests for the existing metrics that could be of benefit for the maintanability of the package, increasing the coverage. After the Euclidean and Manhattan metric, Chebychev is probably the one most used, so I'd be happy to contribute with a PR later.

Chebychev distance metric

use crate::float::kdtree::Axis;
use crate::traits::DistanceMetric;

pub struct Chebyshev {}

impl<A: Axis, const K: usize> DistanceMetric<A, K> for Chebyshev {
    #[inline]
    fn dist(a: &[A; K], b: &[A; K]) -> A {
        a.iter()
            .zip(b.iter())
            .map(|(&a_val, &b_val)| (a_val - b_val).abs())
            .fold(A::zero(), |acc, val| acc.max(val))
    }

    #[inline]
    fn dist1(a: A, b: A) -> A {
        (a - b).abs()
    }
}

https://github.com/cbueth/kiddo/blob/2d03b4fe304b990ec710802c66d79ba4729a24e7/src/float/distance.rs#L60-L75

Hypothesis

I have not looked too deep into the code, but my hypothesis is that the issue is that Kiddo's KdTree query logic uses the dist1 function for branch pruning, assuming that single-axis distance provides a lower bound for the full distance.

kiddo/src/traits.rs

Lines 97 to 109 in 2056051

pub trait DistanceMetric<A, const K: usize> {
/// returns the distance between two K-d points, as measured
/// by a particular distance metric
fn dist(a: &[A; K], b: &[A; K]) -> A;
/// returns the distance between two points along a single axis,
/// as measured by a particular distance metric.
///
/// (needs to be implemented as it is used by the NN query implementations
/// to extend the minimum acceptable distance for a node when recursing
/// back up the tree)
fn dist1(a: A, b: A) -> A;
}

This assumption would fail for Chebyshev because it's a max-based metric ($L_∞$ norm) where dist1 could be not a valid lower bound. The maximum difference could be on any axis.

Example

Trying to make this more graspable on an example: If point A = (0, 0) has query radius 1 and point B = (0, 5) where max(|0-0|, |0-5|) = 5 > 1 should be pruned. But dist1(A.x, B.x) = |0-0| = 0 < 1 keeps this branch incorrectly.

fn test_within_chebyshev_distance() {
    let mut kdtree: KdTree<f32, 2> = KdTree::new();
    let points = [
        ([0.0f32, 0.0f32], 0), // distance 0
        ([0.5f32, 0.5f32], 1), // Chebyshev: 0.5
        ([1.0f32, 0.0f32], 2), // Chebyshev: 1.0  <- would not be found but should be
        ([0.8f32, 0.9f32], 3), // Chebyshev: 0.9
    ];
    
    for (point, index) in points { kdtree.add(&point, index); }
    
    let query_point = [0.0f32, 0.0f32];
    let radius = 1.0;
    let results = kdtree.within::<Chebyshev>(&query_point, radius);
    
    // Point at [1.0, 0.0] (index 2) has Chebyshev distance exactly 1.0 but is NOT found
    let found_indices: Vec<u64> = results.iter().map(|r| r.item).collect();
    assert!(found_indices.contains(&2));  // FAILS - point should be included!
}

Expected: [0, 1, 2, 3] - Actual: [0, 1, 3]
(for long test implementation see https://github.com/cbueth/kiddo/blob/2d03b4fe304b990ec710802c66d79ba4729a24e7/src/float/distance.rs#L765)

Questions

So for knowing where to head to with this, hopefully bringing $L_∞$ support, I have a few questions:

  1. Is my hypothesis about dist1 being the root cause correct?
  2. For adding the Chebyshev DistanceMetric, which commit/tag/branch should I best work from before creating a PR?
  3. If more complicated to solve, do you have an idea where I need to extend the current functionality?
  4. Has there been any work done towards max-based metrics?

Thanks for any guidance!
Carlson

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions