Summary
When comparing images from two backends that should produce matching output, a score of 0.0 gives no useful signal — you can't tell if a feature is missing, mispositioned, or just rendered with a different filter. These enhancements would give actionable diagnostics.
Requested features
1. Block hash search for identical regions
Hash NxN blocks (e.g., 16x16) in both images. Report which regions are identical, which are shifted, and which are completely different. This works even when dimensions differ and quickly answers "is this content present but in the wrong position?"
2. Dimension-mismatch comparison
When images have different dimensions, Mitchell-distort the smaller to match the larger, then run zensim anyway. Report the score with a note that dimensions differed. Currently dimension mismatch is an error — it should be a degraded comparison instead of no comparison.
3. Report sub-zero scores
Don't clamp to 0.0. If the metric produces negative scores, report them — the magnitude tells you how different the images are, which is useful for distinguishing "slightly wrong" from "completely wrong output."
4. Edge-aware scoring (Sobel/Scharr)
Run Sobel or Scharr on both images, then compute zensim on the edge maps. This answers "do the images have the same structural content?" independent of color/brightness differences. Could be triggered automatically when the standard score is below a threshold (e.g., < 50).
Motivation
Comparing v2 and zen rendering backends. A watermark composited with bilinear resize vs Lanczos scores 0.0 — we need to know whether the watermark is there-but-blurry or completely missing. Block hashing and edge-aware scoring would answer that directly.
Summary
When comparing images from two backends that should produce matching output, a score of 0.0 gives no useful signal — you can't tell if a feature is missing, mispositioned, or just rendered with a different filter. These enhancements would give actionable diagnostics.
Requested features
1. Block hash search for identical regions
Hash NxN blocks (e.g., 16x16) in both images. Report which regions are identical, which are shifted, and which are completely different. This works even when dimensions differ and quickly answers "is this content present but in the wrong position?"
2. Dimension-mismatch comparison
When images have different dimensions, Mitchell-distort the smaller to match the larger, then run zensim anyway. Report the score with a note that dimensions differed. Currently dimension mismatch is an error — it should be a degraded comparison instead of no comparison.
3. Report sub-zero scores
Don't clamp to 0.0. If the metric produces negative scores, report them — the magnitude tells you how different the images are, which is useful for distinguishing "slightly wrong" from "completely wrong output."
4. Edge-aware scoring (Sobel/Scharr)
Run Sobel or Scharr on both images, then compute zensim on the edge maps. This answers "do the images have the same structural content?" independent of color/brightness differences. Could be triggered automatically when the standard score is below a threshold (e.g., < 50).
Motivation
Comparing v2 and zen rendering backends. A watermark composited with bilinear resize vs Lanczos scores 0.0 — we need to know whether the watermark is there-but-blurry or completely missing. Block hashing and edge-aware scoring would answer that directly.