Persistent model storage - avoid re-downloading models

## Problem

Currently, when a Model or InferenceService is deleted and recreated, the model file is re-downloaded from the source URL. For large models (13B-70B), this means:

- **26-40GB+ downloads** each time
- **10-30+ minutes** waiting for downloads
- **Wasted bandwidth** and potential rate limiting
- **Poor benchmarking experience** - can't iterate quickly

## Proposed Solution

Implement persistent model storage using Kubernetes PersistentVolumeClaims (PVCs):

### Option 1: Shared Model Cache PVC
```yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: llmkube-model-cache
  namespace: llmkube-system
spec:
  accessModes:
    - ReadWriteMany  # NFS or similar for shared access
  resources:
    requests:
      storage: 100Gi
```

Models are downloaded once to the cache, and pods mount it read-only.

### Option 2: Per-Model PVCs
Each Model resource gets its own PVC that persists across InferenceService deletions.

### Option 3: Node-local Cache
Use `hostPath` or `local` PVs to cache models on GPU nodes (faster, but node-specific).

## Implementation Details

1. **Model Controller Changes:**
   - Check if model exists in cache before downloading
   - Store models in `<cache-pvc>/models/<model-hash>/model.gguf`
   - Use SHA256 of source URL as cache key

2. **InferenceService Controller Changes:**
   - Mount model cache as read-only volume
   - Reference cached model path instead of downloading

3. **CLI Changes:**
   - `llmkube cache list` - Show cached models
   - `llmkube cache clear` - Clear model cache
   - `llmkube cache preload <model-id>` - Pre-download model to cache

4. **Helm Chart Changes:**
   - Add PVC template for model cache
   - Configurable storage class and size

## Benefits

- **Faster iteration** - Deploy/delete/redeploy in seconds
- **Bandwidth savings** - Download once, use many times
- **Better benchmarking** - Quick model switching
- **Cost reduction** - Less egress from HuggingFace

## Related

- Roadmap Q1 2026: "Persistent model storage (stop re-downloading!)"
- Supports air-gapped deployments (pre-populate cache)
- Enables `llmkube benchmark --catalog` to run efficiently

## Success Criteria

- [ ] Model downloaded only once per unique source URL
- [ ] Deleting InferenceService preserves cached model
- [ ] Cache survives controller restarts
- [ ] CLI commands for cache management
- [ ] Documentation for cache configuration

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Persistent model storage - avoid re-downloading models #52

Problem

Proposed Solution

Option 1: Shared Model Cache PVC

Option 2: Per-Model PVCs

Option 3: Node-local Cache

Implementation Details

Benefits

Related

Success Criteria

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Persistent model storage - avoid re-downloading models #52

Description

Problem

Proposed Solution

Option 1: Shared Model Cache PVC

Option 2: Per-Model PVCs

Option 3: Node-local Cache

Implementation Details

Benefits

Related

Success Criteria

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions