GH-560: Wire wgpu fallback into batch inference path

## Five-Whys

1. **Why doesn't batch mode use wgpu?** → `init_batch_model()` in `batch.rs` only tries `OwnedQuantizedModelCuda`. When CUDA parity fails, it falls to CPU.
2. **Why no wgpu in batch init?** → The `BatchModel` struct only has `gpu: Option<OwnedQuantizedModelCuda>` and `cpu: Option<OwnedQuantizedModel>`. No wgpu variant.
3. **Why no wgpu variant?** → Batch was designed before wgpu inference existed.
4. **Why does this matter?** → 32B model in worker mode times out (316s per problem). Batch mode loads model once. Without wgpu batch, 32B eval is CPU-only (slow) or worker-mode (timeouts).
5. **Fix:** Add wgpu to `init_batch_model` — when CUDA parity fails, try wgpu before CPU.

## Contract

`gpu-multi-backend-parity-v1.yaml` equation `backend_priority`:
```
select(backends) = first(b in [cuda, wgpu, cpu] where parity(b) >= 0.98)
```

Currently violated in batch path — batch skips wgpu entirely.

## Acceptance Criteria

- `apr run model.apr --batch-jsonl prompts.jsonl --gpu` uses wgpu when CUDA parity fails
- Batch output shows `"used_gpu": true` with wgpu backend
- 32B MBPP eval completes without timeouts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GH-560: Wire wgpu fallback into batch inference path #560

Five-Whys

Contract

Acceptance Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GH-560: Wire wgpu fallback into batch inference path #560

Description

Five-Whys

Contract

Acceptance Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions