Skip to content

Commit 406b40b

Browse files
committed
feat(gpu): auto-AFV cost grid on screenshot content (mask1x1 median > 95 + e>=7)
Wires conditional resurrection #1 from the V2 dropped-optimizations audit: `LossyEncoder::with_auto_evaluate_afv_on_screenshots(bool)` (new default `true`) auto-enables AFV0-3 cost-grid evaluation inside `prepare_strategy_search_plan_inner` when the per-block mask1x1 median exceeds `SCREENSHOT_MEDIAN_MASK_THRESHOLD` (95.0) AND `effort >= 7`. Reuses the per-block mask means already computed for the AQ field, so the dispatch is a free median over a small Vec. Explicit `with_evaluate_afv(true)` still always wins; explicit `with_auto_evaluate_afv_on_screenshots(false)` recovers strict pre-2026-05-17 behavior. Photos: byte-identical (every CLIC sample has mask1x1 median in 46-77 range — below the 95 threshold, gate never fires). Screenshots: small bytes win on the subset where AFV picks survive the patches case-1 recompute. 10-image gb82-sc sweep at d=1.0 saves -0.091% bytes total; per-image winners are gmessages.png (-0.788%, 184 picks survive), graph.png (-0.403%, 13 picks), gui.png (-0.116%, 9 picks). terminal.png + windows.png see GPU AFV picks (40 and 264 respectively) but the patches case-1 path in encoder.rs:2120 overwrites the strategy map via the CPU `compute_ac_strategy` (which does not evaluate AFV) — preserving GPU AFV picks across the recompute is follow-on work. `corpus_regression` invariant preserved on photos (gate stays off, byte- identical) and on screenshots (they flow through `refine_and_encode_smart` → `SkippedStratSearchAsScreenshot` which never calls `prepare_strategy_search_plan`). Refs: - ~/.claude/.../memory/vardct_gpu_dropped_optimizations_resurrection_2026-05-17.md - ~/.claude/.../memory/dropped_optimizations_for_parity_2026-05-15.md item #1 - bench: benchmarks/auto_afv_screenshots_sweep_2026-05-17.{txt,meta} - tests: 5 of 5 pass (`tests/afv_cost_grid_wiring.rs`)
1 parent 99ad039 commit 406b40b

8 files changed

Lines changed: 737 additions & 14 deletions

CHANGELOG.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,36 @@
22

33
## [Unreleased]
44

5+
### Added (May 17, 2026)
6+
7+
- **Auto-AFV-on-screenshots dispatch in the GPU strategy search**.
8+
`LossyEncoder` now exposes `with_auto_evaluate_afv_on_screenshots(bool)`
9+
(default `true`) that auto-enables AFV0-3 cost-grid evaluation inside
10+
`prepare_strategy_search_plan_inner` when the per-block `mask1x1`
11+
median exceeds `SCREENSHOT_MEDIAN_MASK_THRESHOLD` (95.0) AND
12+
`effort >= 7`. Same discriminator the `SkippedStratSearchAsScreenshot`
13+
path uses; reuses `aq_field_means` already produced for the AQ field
14+
so the dispatch is essentially free (median over a few-thousand-entry
15+
vector). Explicit `with_evaluate_afv(true)` still always wins.
16+
Photos are byte-identical (median < 95 on every CLIC sample tested,
17+
46-77 range — gate never fires). Screenshots see a small but real
18+
bytes win on the subset where AFV picks survive the patches case-1
19+
recompute: 10-image `gb82-sc` sweep at d=1.0 saves -0.091% bytes
20+
total; per-image winners are gmessages.png (-0.788%), graph.png
21+
(-0.403%), gui.png (-0.116%). On screenshots that trigger
22+
`find_and_build_patches`, the CPU `compute_ac_strategy` recompute on
23+
patches-subtracted XYB still overwrites GPU AFV picks (libjxl-parity
24+
contract); preserving GPU AFV picks across patches recompute is
25+
follow-on work. `corpus_regression` bitstream stays byte-identical on
26+
photo rows (no dispatch fires) and on screenshot rows (they flow
27+
through `refine_and_encode_smart``SkippedStratSearchAsScreenshot`
28+
which never calls `prepare_strategy_search_plan`). Bench at
29+
`benchmarks/auto_afv_screenshots_sweep_2026-05-17.{txt,meta}`. Tests:
30+
`tests/afv_cost_grid_wiring.rs` (`test_auto_afv_default_on_but_synthetic_does_not_fire`,
31+
`test_auto_afv_opt_out_disables_dispatch`). Reference: `dropped_optimizations_for_parity_2026-05-15.md`
32+
item #1 and `vardct_gpu_dropped_optimizations_resurrection_2026-05-17.md`
33+
top-3 conditional resurrection.
34+
535
### Fixed (May 15, 2026)
636

737
- **`decode_via_jxl_rs` was mislabeling sRGB-encoded f32 as linear** in

Cargo.lock

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
warning: methods `capacity`, `len`, `is_empty`, `lookup_only`, and `unique_keys` are never used
2+
--> /home/lilith/work/zen/jxl-encoder/jxl-encoder/src/modular/inline_dedup_table.rs:216:12
3+
|
4+
186 | impl InlineDedupTable {
5+
| --------------------- methods in this implementation
6+
...
7+
216 | pub fn capacity(&self) -> usize {
8+
| ^^^^^^^^
9+
...
10+
222 | pub fn len(&self) -> usize {
11+
| ^^^
12+
...
13+
228 | pub fn is_empty(&self) -> bool {
14+
| ^^^^^^^^
15+
...
16+
356 | pub fn lookup_only(&self, key: &[u8; KEY_BYTES]) -> Option<u32> {
17+
| ^^^^^^^^^^^
18+
...
19+
383 | pub fn unique_keys(&self) -> &[[u8; KEY_BYTES]] {
20+
| ^^^^^^^^^^^
21+
|
22+
= note: `#[warn(dead_code)]` (part of `#[warn(unused)]`) on by default
23+
24+
warning: fields `gather_dedup_phase3`, `parallel_max_depth`, `parallel_recursion_floor`, and `parallel_root_threshold` are never read
25+
--> /home/lilith/work/zen/jxl-encoder/jxl-encoder/src/modular/tree_learn.rs:195:9
26+
|
27+
135 | pub struct TreeLearningParams {
28+
| ------------------ fields in this struct
29+
...
30+
195 | pub gather_dedup_phase3: bool,
31+
| ^^^^^^^^^^^^^^^^^^^
32+
...
33+
201 | pub parallel_max_depth: u32,
34+
| ^^^^^^^^^^^^^^^^^^
35+
...
36+
205 | pub parallel_recursion_floor: usize,
37+
| ^^^^^^^^^^^^^^^^^^^^^^^^
38+
...
39+
209 | pub parallel_root_threshold: usize,
40+
| ^^^^^^^^^^^^^^^^^^^^^^^
41+
42+
warning: function `gather_samples_strided_with_dedup` is never used
43+
--> /home/lilith/work/zen/jxl-encoder/jxl-encoder/src/modular/tree_learn.rs:843:15
44+
|
45+
843 | pub(crate) fn gather_samples_strided_with_dedup(
46+
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
47+
48+
warning: enum `IwWeightKind` is never used
49+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:39:10
50+
|
51+
39 | pub enum IwWeightKind {
52+
| ^^^^^^^^^^^^
53+
|
54+
= note: `#[warn(dead_code)]` (part of `#[warn(unused)]`) on by default
55+
56+
warning: struct `IwWeightConfig` is never constructed
57+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:73:12
58+
|
59+
73 | pub struct IwWeightConfig {
60+
| ^^^^^^^^^^^^^^
61+
62+
warning: function `compute_iw_weights` is never used
63+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:123:8
64+
|
65+
123 | pub fn compute_iw_weights(
66+
| ^^^^^^^^^^^^^^^^^^
67+
68+
warning: function `compute_local_variance` is never used
69+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:194:4
70+
|
71+
194 | fn compute_local_variance(
72+
| ^^^^^^^^^^^^^^^^^^^^^^
73+
74+
warning: function `compute_directional_max_variance` is never used
75+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:251:4
76+
|
77+
251 | fn compute_directional_max_variance(
78+
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
79+
80+
warning: function `local_variance_into` is never used
81+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:321:4
82+
|
83+
321 | fn local_variance_into(plane: &[f32], width: usize, height: usize, half: usize, out: &mut [f32]) {
84+
| ^^^^^^^^^^^^^^^^^^^
85+
86+
warning: enum `GradNorm` is never used
87+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:345:6
88+
|
89+
345 | enum GradNorm {
90+
| ^^^^^^^^
91+
92+
warning: function `compute_gradient` is never used
93+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:350:4
94+
|
95+
350 | fn compute_gradient(
96+
| ^^^^^^^^^^^^^^^^
97+
98+
warning: struct `WeightedPool` is never constructed
99+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:379:12
100+
|
101+
379 | pub struct WeightedPool;
102+
| ^^^^^^^^^^^^
103+
104+
warning: associated functions `mean`, `l2`, and `l4` are never used
105+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:384:12
106+
|
107+
381 | impl WeightedPool {
108+
| ----------------- associated functions in this implementation
109+
...
110+
384 | pub fn mean(values: &[f32], weights: &[f32]) -> f64 {
111+
| ^^^^
112+
...
113+
398 | pub fn l2(values: &[f32], weights: &[f32]) -> f64 {
114+
| ^^
115+
...
116+
412 | pub fn l4(values: &[f32], weights: &[f32]) -> f64 {
117+
| ^^
118+
119+
warning: struct `IwSsimFeatures` is never constructed
120+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:434:12
121+
|
122+
434 | pub struct IwSsimFeatures {
123+
| ^^^^^^^^^^^^^^
124+
125+
warning: associated items `FEATURES_PER_CALL`, `as_array`, and `pool_from_maps` are never used
126+
--> /home/lilith/work/zen/zensim/zensim/src/iw_pool.rs:451:15
127+
|
128+
449 | impl IwSsimFeatures {
129+
| ------------------- associated items in this implementation
130+
450 | /// Number of features per call — matches `FEATURES_PER_CHANNEL_*_MASKED` in `metric.rs`.
131+
451 | pub const FEATURES_PER_CALL: usize = 6;
132+
| ^^^^^^^^^^^^^^^^^
133+
...
134+
454 | pub fn as_array(&self) -> [f64; 6] {
135+
| ^^^^^^^^
136+
...
137+
474 | pub fn pool_from_maps(
138+
| ^^^^^^^^^^^^^^
139+
140+
warning: `jxl-encoder` (lib) generated 3 warnings
141+
warning: `zensim` (lib) generated 12 warnings
142+
Finished `release` profile [optimized + debuginfo] target(s) in 0.14s
143+
Running `target/release/examples/auto_afv_bytes_ab --distance 1.0`
144+
[auto_afv_bytes_ab] distance=1 images=6 (auto-AFV OFF baseline vs ON)
145+
image MP off_bytes on_bytes Δ_bytes Δ_pct afv_o afv_n ms_on
146+
[auto_afv] mask1x1 block-mean median=100.013, threshold=95.000, effort=7, fired=true
147+
[auto_afv] mask1x1 block-mean median=100.013, threshold=95.000, effort=7, fired=true
148+
terminal.png 1.75 63175 63175 +0 +0.000% 0 40 306.0
149+
[auto_afv] mask1x1 block-mean median=100.013, threshold=95.000, effort=7, fired=true
150+
[auto_afv] mask1x1 block-mean median=100.013, threshold=95.000, effort=7, fired=true
151+
imac_g3.png 5.62 299455 299455 +0 +0.000% 0 0 1318.2
152+
[auto_afv] mask1x1 block-mean median=48.822, threshold=95.000, effort=7, fired=false
153+
[auto_afv] mask1x1 block-mean median=48.822, threshold=95.000, effort=7, fired=false
154+
windows95.png 0.31 51672 51672 +0 +0.000% 0 0 67.1
155+
[auto_afv] mask1x1 block-mean median=46.294, threshold=95.000, effort=7, fired=false
156+
[auto_afv] mask1x1 block-mean median=46.294, threshold=95.000, effort=7, fired=false
157+
02809272b4ca9b08af45771501b741296187c7e26907efb44abbbfcb6cd804f7.png 1.05 297492 297492 +0 +0.000% 0 0 398.4
158+
[auto_afv] mask1x1 block-mean median=59.988, threshold=95.000, effort=7, fired=false
159+
[auto_afv] mask1x1 block-mean median=59.988, threshold=95.000, effort=7, fired=false
160+
07b9f93f170a0381836bdf301280a5b80b2c4be6e66f793a3c335dc200fb4e5b.png 1.05 193716 193716 +0 +0.000% 0 0 148.6
161+
[auto_afv] mask1x1 block-mean median=77.282, threshold=95.000, effort=7, fired=false
162+
[auto_afv] mask1x1 block-mean median=77.282, threshold=95.000, effort=7, fired=false
163+
22ea12c903e41583b7c469cb86040157.png 1.05 98304 98304 +0 +0.000% 0 0 147.0
164+
165+
TOTAL (n=6 images, 10.82 MP): 1003814 → 1003814 bytes (+0 bytes, +0.000%)
Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
Auto-AFV-on-screenshots dispatch — bytes A/B sweep
2+
====================================================
3+
4+
Date: 2026-05-17
5+
Commit (will be set at commit time)
6+
Host: water-cooled AMD Ryzen 9 7950X + RTX 5070 (CUDA backend)
7+
Workspace: ~/work/zen/jxl-encoder-gpu--afv-cost-grid (jj on main)
8+
9+
Sweep
10+
-----
11+
Corpus: codec-corpus/gb82-sc (10 PNG screenshots)
12+
Distances: 1.0, 2.0, 3.0
13+
Encoder entry: GpuEncoder::encode_lossy_to_bitstream_via_precomputed
14+
Effort: 7 (default)
15+
Knob under test: LossyEncoder::with_auto_evaluate_afv_on_screenshots(bool)
16+
- PATH A (off): explicit override `with_auto_evaluate_afv_on_screenshots(false)` to
17+
reproduce pre-2026-05-17 default-off behavior
18+
- PATH B (on): new default; auto-AFV gates on `mask1x1 median > 95 AND effort >= 7`
19+
20+
Driver: cargo run --release -p jxl-encoder-gpu --features 'cuda encoder' \
21+
--example auto_afv_bytes_ab -- --distance <D> [--image PATH ...]
22+
23+
Headline
24+
--------
25+
d=1.0 (10 screenshots, 30.65 MP total): -1343 bytes / -0.091%
26+
d=2.0 (10 screenshots, 30.65 MP total): -465 bytes / -0.041%
27+
d=3.0 (10 screenshots, 30.65 MP total): +0 bytes / +0.000%
28+
29+
Per-image winners at d=1.0:
30+
gmessages.png -1190 bytes (-0.788%) 184 AFV picks survive case-1 path
31+
graph.png -108 bytes (-0.403%) 13 AFV picks survive
32+
gui.png -45 bytes (-0.116%) 9 AFV picks survive
33+
34+
Per-image byte-identical (AFV picks were wiped by patches case-1 recompute):
35+
terminal.png 40 picks → 0 bytes change
36+
windows.png 264 picks → 0 bytes change
37+
windows95.png median 48.8 → gate correctly skips
38+
39+
Photos (default candidate set, 3 CLIC images at d=1.0): all byte-identical.
40+
median per image: 46.3, 60.0, 77.3 (all < 95 threshold → gate stays off)
41+
42+
Interaction with patches case-1 path
43+
-------------------------------------
44+
On screenshots that trigger `find_and_build_patches`, the CPU patches-subtracted
45+
recompute (`compute_ac_strategy` at encoder.rs:2120) overwrites our GPU AFV picks
46+
with a CPU-only DCT8-dominant assignment. This is the libjxl-parity contract and
47+
applies to terminal.png + windows.png in our sweep. The auto-AFV win only realizes
48+
on screenshots where patches DON'T fire OR where the CPU recompute still picks AFV.
49+
50+
A future chunk could merge the GPU's AFV picks into the patches-subtracted CPU
51+
assignment (or port the AFV cost-grid into the CPU `compute_ac_strategy`); that's
52+
out of scope for this 1-2 day chunk.
53+
54+
Photo verification (corpus_regression analog)
55+
----------------------------------------------
56+
3 CLIC photos at d=1.0 with new default (`auto_evaluate_afv_on_screenshots = true`)
57+
produce byte-identical bitstream to pre-2026-05-17 (off-AFV) baseline. Auto dispatch
58+
threshold (median > 95) is well above all measured photo medians (46-77), so the
59+
gate never fires on photo content. `corpus_regression` test (gated `corpus` feature)
60+
should continue to pass byte-for-byte on its photo rows. Screenshot rows in
61+
`corpus_regression` flow through `refine_and_encode_smart` →
62+
`SkippedStratSearchAsScreenshot` (no strat-search), so they are also unaffected
63+
by this dispatch.
Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
=== distance=1.0 ===
2+
[auto_afv_bytes_ab] distance=1 images=10 (auto-AFV OFF baseline vs ON)
3+
image MP off_bytes on_bytes Δ_bytes Δ_pct afv_o afv_n ms_on
4+
codec_wiki.png 4.26 116673 116673 +0 +0.000% 0 0 782.9
5+
gmessages.png 4.45 151002 149812 -1190 -0.788% 0 184 746.9
6+
graph.png 0.38 26832 26724 -108 -0.403% 0 13 62.5
7+
gui.png 1.53 38663 38618 -45 -0.116% 0 9 228.9
8+
imac_dark.png 5.62 305481 305481 +0 +0.000% 0 0 1150.0
9+
imac_g3.png 5.62 299455 299455 +0 +0.000% 0 0 1152.5
10+
imessage.png 3.16 149771 149771 +0 +0.000% 0 0 542.2
11+
terminal.png 1.75 63175 63175 +0 +0.000% 0 40 249.3
12+
windows95.png 0.31 51672 51672 +0 +0.000% 0 0 61.1
13+
windows.png 3.56 269079 269079 +0 +0.000% 0 264 619.1
14+
15+
TOTAL (n=10 images, 30.65 MP): 1471803 → 1470460 bytes (-1343 bytes, -0.091%)
16+
=== distance=2.0 ===
17+
[auto_afv_bytes_ab] distance=2 images=10 (auto-AFV OFF baseline vs ON)
18+
image MP off_bytes on_bytes Δ_bytes Δ_pct afv_o afv_n ms_on
19+
codec_wiki.png 4.26 93748 93748 +0 +0.000% 0 0 877.3
20+
gmessages.png 4.45 113909 113450 -459 -0.403% 0 46 755.1
21+
graph.png 0.38 20640 20640 +0 +0.000% 0 0 67.9
22+
gui.png 1.53 28739 28733 -6 -0.021% 0 1 231.0
23+
imac_dark.png 5.62 235459 235459 +0 +0.000% 0 0 1262.3
24+
imac_g3.png 5.62 249040 249040 +0 +0.000% 0 0 1154.2
25+
imessage.png 3.16 114258 114258 +0 +0.000% 0 0 560.5
26+
terminal.png 1.75 51197 51197 +0 +0.000% 0 0 281.2
27+
windows95.png 0.31 39650 39650 +0 +0.000% 0 0 62.7
28+
windows.png 3.56 193757 193757 +0 +0.000% 0 53 674.5
29+
30+
TOTAL (n=10 images, 30.65 MP): 1140397 → 1139932 bytes (-465 bytes, -0.041%)
31+
=== distance=3.0 ===
32+
[auto_afv_bytes_ab] distance=3 images=10 (auto-AFV OFF baseline vs ON)
33+
image MP off_bytes on_bytes Δ_bytes Δ_pct afv_o afv_n ms_on
34+
codec_wiki.png 4.26 81876 81876 +0 +0.000% 0 0 837.2
35+
gmessages.png 4.45 92662 92662 +0 +0.000% 0 0 773.4
36+
graph.png 0.38 16911 16911 +0 +0.000% 0 0 64.5
37+
gui.png 1.53 23922 23922 +0 +0.000% 0 0 228.2
38+
imac_dark.png 5.62 201030 201030 +0 +0.000% 0 0 1203.5
39+
imac_g3.png 5.62 223092 223092 +0 +0.000% 0 0 1196.1
40+
imessage.png 3.16 98410 98410 +0 +0.000% 0 0 530.9
41+
terminal.png 1.75 43012 43012 +0 +0.000% 0 0 249.9
42+
windows95.png 0.31 33827 33827 +0 +0.000% 0 0 60.6
43+
windows.png 3.56 159138 159138 +0 +0.000% 0 0 632.6
44+
45+
TOTAL (n=10 images, 30.65 MP): 973880 → 973880 bytes (+0 bytes, +0.000%)

0 commit comments

Comments
 (0)