Minus - Development Notes

Overview

HDMI passthrough with real-time ML-based ad detection and blocking using dual NPUs:

PaddleOCR on RK3588 NPU (~400ms per frame, 1.0s timeout)
FastVLM-1.5B on Axera LLM 8850 NPU (~0.9s per frame, 1.5s soft timeout / 5s hard timeout)
Spanish vocabulary practice during ad blocks!

Documentation

Document	Description
docs/FEATURES.md	Complete feature list and capabilities
docs/ARCHITECTURE.md	System architecture and data flow
docs/AESTHETICS.md	Visual design guide for UI/overlays
MARISOL.md	AI agent context guide
docs/DEBUG_GLITCHES.md	Video glitch debugging notes
docs/FPS_DEBUGGING.md	FPS tracking and optimization
docs/AUDIO.md	Audio passthrough documentation
docs/VLM_NPU_DEGRADATION.md	Investigation of "NPU degradation" — root cause is per-image output-length variance; fix is `max_new_tokens` cap
docs/IR_TRANSMITTER.md	IR transmitter for the REI 8K HDMI switch (PWM3 on pin 38) — wiring, NEC codes, API, troubleshooting
docs/IR_RECEIVER.md	IR receiver eval on pin 3 (`gpiochip4 11`) — bench-tested decode of NEC remotes, gotchas, sketch for a future `IRReceiver` module
docs/STATUS_LEDS.md	WS2812B status strip on SPI0 MOSI (pin 19) — wiring, state catalogue, API, encoding rationale

Visual Design

See docs/AESTHETICS.md for the complete visual design guide including:

Color palette (black background, matrix green, danger red, purple accents)
Typography (VT323 for display, IBM Plex Mono for body, DejaVu for TV overlays)
Component styling and animations
TV overlay layout specifications

Architecture

┌──────────────┐     ┌────────────────────┐     ┌─────────────────────┐
│   HDMI-RX    │────▶│     ustreamer      │────▶│  GStreamer Pipeline │
│ /dev/video0  │     │ (MJPEG encoding)   │     │  (queue + kmssink)  │
│  4K@30fps    │     │                    │     │                     │
│              │     │   :9090/stream     │     │                     │
│              │     │   :9090/snapshot   │     │                     │
└──────────────┘     └────────┬───────────┘     └─────────────────────┘
                              │
                              ▼ HTTP snapshot (~150ms, non-blocking)
              ┌───────────────┴───────────────┐
              │                               │
     ┌────────┴────────┐           ┌──────────┴──────────┐
     │   OCR Worker    │           │    VLM Worker       │
     │  ┌───────────┐  │           │  ┌───────────────┐  │
     │  │ PaddleOCR │  │           │  │ FastVLM-1.5B  │  │
     │  │ RK3588 NPU│  │           │  │ Axera LLM 8850│  │
     │  │ ~400ms    │  │           │  │ ~0.9s         │  │
     │  └───────────┘  │           │  └───────────────┘  │
     └────────┬────────┘           └──────────┬──────────┘
              │                               │
              └───────────────┬───────────────┘
                              │
                     ┌────────┴────────┐
                     │ Blocking Mode   │
                     │ (ustreamer API) │
                     └─────────────────┘

Key Architecture Points:

Simple GStreamer pipeline with queue max-size-buffers=3 leaky=downstream
All blocking overlay rendering done in ustreamer's MPP encoder at 60fps
No X11 required - uses DRM/KMS directly via kmssink
Auto-detects HDMI output, resolution, and DRM plane at startup
Works with both 4K and 1080p displays (uses display's preferred resolution)
Both ML workers run concurrently on separate NPUs
Display runs independently at 30fps without any stutter

Key Files

File	Purpose
`minus.py`	Main entry point - orchestrates everything
`minus.spec`	PyInstaller spec for building executable
`src/ad_blocker.py`	GStreamer video pipeline, blocking API client
`src/audio.py`	GStreamer audio passthrough with mute control
`src/ocr.py`	PaddleOCR on RKNN NPU, keyword detection
`src/ocr_worker.py`	Process-based OCR with hard timeout, warmup, and keepalive
`src/vlm.py`	FastVLM-1.5B on Axera NPU (ad detection + custom queries)
`src/vlm_worker.py`	Process-based VLM with hard timeout, warmup, and keepalive
`src/autonomous_mode.py`	Autonomous mode - VLM-guided YouTube playback
`src/health.py`	Unified health monitor for all subsystems
`src/webui.py`	Flask web UI for remote monitoring/control
`src/fire_tv.py`	Fire TV ADB remote control for ad skipping
`src/roku.py`	Roku ECP remote control
`src/ir_transmitter.py`	NEC IR transmitter over PWM3 (REI 8K HDMI switch). Thread-safe, 1.5 s cooldown
`src/status_leds.py`	Raw WS2812B SPI driver. 8 LEDs, 10% brightness cap, Adafruit-canonical 8-bit-per-WS-bit encoding at 6.4 MHz
`src/status_led_controller.py`	State machine + animation thread on top of `status_leds.py`. States: off/initializing/idle/blocking/no_signal/autonomous/error
`src/device_config.py`	Streaming device type configuration and persistence
`src/fire_tv_setup.py`	Fire TV auto-setup flow with overlay notifications
`src/wifi_manager.py`	WiFi captive portal and AP mode management
`src/overlay.py`	Notification overlay via ustreamer API
`src/vocabulary.py`	Spanish vocabulary — original `SPANISH_VOCABULARY` (~550 entries, 4-tuples) plus `SPANISH_VOCABULARY_EXTENDED` (~200 entries, 5-tuples with two example sentences). `VOCABULARY_COMBINED` is the unified list the ad overlay iterates.
`src/console.py`	Console blanking/restore functions
`src/drm.py`	DRM output probing, adaptive bandwidth fallback
`src/v4l2.py`	V4L2 device probing (format, resolution)
`src/config.py`	MinusConfig dataclass
`src/capture.py`	UstreamerCapture class for snapshot capture
`src/screenshots.py`	ScreenshotManager with dHash dedup + blank rejection
`src/skip_detection.py`	Skip button detection (regex patterns)
`test_fire_tv.py`	Fire TV controller test and interactive remote
`ir_transmit.py`	Standalone CLI for the IR transmitter (`sudo python3 ir_transmit.py <button>`)
`tests/test_modules.py`	Comprehensive test suite (300+ tests)
`tests/test_autonomous_mode.py`	Autonomous mode unit tests
`tests/test_review_ui.py`	Playwright UI tests for screenshot review
`tests/test_ir_transmitter.py`	Unit tests for IR transmitter (mocked sysfs, 20 tests)
`tests/test_ir_ui.py`	Playwright UI tests for IR remote panel
`tests/test_status_led_controller.py`	Unit tests for status-LED state machine (mocked hardware, 31 tests)
`tests/test_status_leds_ui.py`	Playwright UI tests for status-LED toggle + state palette
`tests/test_status_led_states.py`	Hardware walk: every controller state across all 8 LEDs, 5 s each
`test_status_leds.py`	Hardware walk/flash test for the WS2812B strip (R/G/B/W)
`tests/test_ocr_ad_detection.py`	OCR ad pattern detection tests (143+ cases)
`src/templates/index.html`	Web UI single-page app
`src/static/style.css`	Web UI dark theme styles
`install.sh`	Install as systemd service
`uninstall.sh`	Remove systemd service
`stop.sh`	Graceful shutdown script
`minus.service`	systemd service file
`screenshots/ads/`	OCR-detected ads (for training)
`screenshots/non_ads/`	User paused = false positives (for training)
`screenshots/vlm_spastic/`	VLM uncertainty cases (for analysis)
`screenshots/static/`	Static screen suppression (still frames)

Running

python3 minus.py

Command-line options:

--device /dev/video1      # Custom capture device
--ocr-timeout 1.5         # OCR timeout in seconds (default: 1.5)
--max-screenshots 100     # Keep N recent screenshots (default: 50, 0=unlimited)
--check-signal            # Just check HDMI signal and exit
--connector-id 231        # DRM connector ID (auto-detected if not specified)
--plane-id 192            # DRM plane ID (auto-detected if not specified)
--webui-port 80         # Web UI port (default: 80)

Auto-detection at startup:

Connected HDMI output - Works with either HDMI-A-1 (connector 215) or HDMI-A-2 (connector 231)
Preferred resolution - Reads EDID to get the display's preferred mode (e.g., 4K@60Hz or 1080p@60Hz)
NV12-capable overlay plane - Finds a suitable DRM plane that supports NV12 format for video output
Audio output device - Matches ALSA device to the connected HDMI output (hw:0,0 for HDMI-A-1, hw:1,0 for HDMI-A-2)

This allows Minus to work with different displays without manual configuration.

Adaptive HDMI Bandwidth Fallback:

4K@60Hz RGB/YCbCr 4:4:4 requires 18 Gbps HDMI bandwidth. Some cables, adapters, or display paths can't handle this, resulting in "No Signal" on the TV even though the kernel reports success.

Minus includes adaptive bandwidth detection via src/drm.py:

Function	Purpose
`get_color_format(connector_id)`	Read current color format (RGB, YCbCr 4:4:4, 4:2:2, 4:2:0)
`set_color_format(connector_id, format)`	Set color format with retry logic
`check_hdmi_i2c_errors(threshold, window)`	Detect signal problems via dmesg

Detection heuristic: When HDMI signal fails at high bandwidth, the dwhdmi driver floods dmesg with i2c read err! messages. This is more reliable than kernel connector status (which shows "connected" even when signal fails).

Color format values:

COLOR_FORMAT_RGB (0) - Full bandwidth
COLOR_FORMAT_YCBCR444 (1) - Full bandwidth
COLOR_FORMAT_YCBCR422 (2) - Reduced bandwidth
COLOR_FORMAT_YCBCR420 (3) - Half bandwidth (9 Gbps) - use for problematic cables

Manual fallback:

# Stop minus first (it holds DRM master lock)
sudo systemctl stop minus

# Set YCbCr 4:2:0 for half bandwidth
sudo modetest -M rockchip -w 215:color_format:3

# Restart minus
sudo systemctl start minus

Environment variables:

# Paths (override defaults for different installations)
MINUS_USTREAMER_PATH=/path/to/ustreamer     # Default: /home/radxa/ustreamer-patched
MINUS_VLM_MODEL_DIR=/path/to/vlm/models     # Default: /home/radxa/axera_models/FastVLM-1.5B
MINUS_OCR_MODEL_DIR=/path/to/ocr/models     # Default: /home/radxa/rknn-llm/.../paddleocr

# Timing thresholds
MINUS_ANIMATION_START=0.3        # Blocking animation duration (seconds)
MINUS_ANIMATION_END=0.25         # Unblocking animation duration (seconds)
MINUS_FRAME_STALE_THRESHOLD=5.0  # Health check frame freshness (seconds)
MINUS_DYNAMIC_COOLDOWN=0.5       # Wait after screen becomes dynamic (seconds)
MINUS_SCENE_CHANGE_THRESHOLD=0.01  # Frame difference threshold for scene change
MINUS_VLM_ALONE_THRESHOLD=5      # Consecutive VLM detections needed to trigger alone

Performance

Metric	Value
Display (video)	30fps (GStreamer kmssink, MJPEG → NV12 → DRM plane)
Display (blocking)	60fps (ustreamer MPP blocking mode with FreeType)
Preview window	60fps (hardware-scaled in MPP encoder)
Blocking composite	~0.5ms per frame overhead
Audio mute/unmute	INSTANT (volume element mute property)
ustreamer MJPEG stream	~60fps (MPP hardware encoding at 4K)
OCR latency	100-200ms capture + 250-400ms inference
VLM latency	~0.9-1.1s per frame (FastVLM-1.5B, process-based with soft/hard timeout)
VLM model load	~30s (includes 4 warmup inferences + keepalive thread)
Snapshot capture	~150ms (4K JPEG download)
OCR image size	960x540 (downscaled from 4K for speed)
ustreamer quality	80% JPEG (MPP encoder)
Animation start	0.3s (fast blocking response)
Animation end	0.25s (fast unblocking)

FPS Tracking:

GStreamer identity element with pad probe counts frames
FPS logged every 60 seconds via health monitor
Warning logged if FPS drops below 25

ustreamer-patched (NV12 + MPP Hardware Encoding)

We use a patched version of ustreamer from garagehq/ustreamer that adds:

NV12/NV16/NV24 format support for RK3588 HDMI-RX devices
MPP hardware JPEG encoding using RK3588 VPU (~60fps at 4K!)
Blocking mode system with FreeType TrueType rendering for ad blocking overlays
Extended timeouts for RK3588 HDMI-RX driver compatibility
Multi-worker MPP support (4 parallel encoders optimal)
Cache sync fix for DMA-related visual artifacts
Thread-safe FreeType mutex for multi-worker encoding

Why patched ustreamer? The stock PiKVM ustreamer doesn't support NV12 format or RK3588 hardware encoding. Our fork adds NV12→JPEG encoding via Rockchip MPP (Media Process Platform) that achieves ~60fps on 4K input with minimal CPU usage.

Dynamic Format Detection: Minus automatically probes the V4L2 device to detect its current format and resolution. Supported formats:

NV12 - RK3588 HDMI-RX native (uses MPP hardware encoder directly)
NV24 - Some devices like Roku (converted to NV12 for MPP, ~60fps)
BGR24/BGR3 - Google TV and similar devices (converted to NV12 for MPP, ~42fps at 4K)
YUYV/UYVY - Webcam-style devices
MJPEG - Pre-compressed JPEG sources

Format conversions (NV24→NV12, BGR24→NV12) are done in software in the MPP encoder before hardware JPEG encoding.

Performance comparison (4K HDMI input):

Mode	ustreamer FPS	CPU Usage	Notes
CPU encoding	~4 fps	~100%	CPU can't keep up with 4K JPEG encoding
MPP hardware	~60 fps	~5%	`--encoder=mpp-jpeg` (default)

ustreamer command (used by Minus):

/home/radxa/ustreamer-patched \
  --device=/dev/video0 \
  --format=NV12 \
  --resolution=3840x2160 \
  --persistent \
  --port=9090 \
  --host=0.0.0.0 \
  --encoder=mpp-jpeg \
  --encode-scale=passthrough \
  --quality=80 \
  --workers=4 \
  --buffers=5

Installation:

# Clone and build with MPP support
git clone https://github.com/garagehq/ustreamer.git /home/radxa/ustreamer-garagehq
cd /home/radxa/ustreamer-garagehq
make WITH_MPP=1
cp ustreamer /home/radxa/ustreamer-patched

# Minus uses /home/radxa/ustreamer-patched automatically

Key changes in garagehq/ustreamer:

src/ustreamer/encoders/mpp/encoder.c - MPP hardware JPEG encoder with cache sync, blocking composite, NV24→NV12 and BGR24→NV12 format conversion
src/libs/capture.c - NV12/NV16/NV24/BGR24 format support, extended timeouts
src/libs/blocking.c - FreeType text rendering, NV12 compositing, thread-safe mutex
src/ustreamer/http/server.c - Blocking API endpoints (/blocking, /blocking/set, /blocking/background)
src/ustreamer/encoder.c - MPP encoder integration, multi-worker support
src/ustreamer/options.c - --encoder=mpp-jpeg CLI option

Audio Passthrough

Hardware:

Capture: hw:4,0 (rockchip,hdmiin) - HDMI-RX audio input
Playback: hw:0,0 (rockchip-hdmi0) - HDMI-TX0 output
Format: 48kHz, stereo, S16LE

GStreamer Pipeline:

alsasrc (HDMI) ──┐
                 ├──► audiomixer ──► volume ──► alsasink
audiotestsrc ────┘
(silent keepalive)

The audiotestsrc wave=silence provides a silent keepalive that prevents pipeline stalls when the HDMI source has no audio (between songs, during video silence, etc.).

Mute Control:

ad_blocker.show() calls audio.mute() - instant mute during ads
ad_blocker.hide() calls audio.unmute() - restore audio after ads
Uses GStreamer volume element's mute property (no pipeline restart)

Why separate pipeline?

Audio runs independently from video - simpler debugging
If audio fails, video continues unaffected
No sync issues for live passthrough

Error Recovery:

GStreamer bus monitors for pipeline errors and EOS
Buffer probe tracks audio flow (detects stalls)
Watchdog thread checks every 3s, restarts if no buffer for 6s
Exponential backoff for restarts (1s → 2s → 4s → ... → 60s max)
No maximum restart limit - always tries to recover
Backoff resets after 5 seconds of sustained audio flow
Mute state is preserved across restarts

Testing:

# Test passthrough manually
gst-launch-1.0 alsasrc device=hw:4,0 ! \
  "audio/x-raw,rate=48000,channels=2,format=S16LE" ! \
  audioconvert ! audioresample ! \
  alsasink device=hw:0,0 sync=false

# Check if HDMI source has audio
v4l2-ctl -d /dev/video0 --get-ctrl audio_present

Ad Detection Logic (Weighted Model)

OCR (Primary - Authoritative):

Triggers blocking immediately on 1 detection
Stops blocking after 2 consecutive no-ads (OCR_STOP_THRESHOLD=2, was 4 — tuned via tests/block_latency_harness.py)
Authoritative for stopping when OCR triggered the block
Tracks last_ocr_ad_time for VLM context
Handles common OCR misreads in ad timestamps (see below)

VLM (Secondary - Anti-Waffle Protected):

Uses sliding window of last 45 seconds of VLM decisions (vlm_history_window)
Only triggers blocking alone if 90%+ of recent decisions are "ad" (vlm_start_agreement)
Hysteresis: needs 100% agreement to START (capped at 95% via vlm_start_threshold_cap so a few stragglers can't block forever)
Minimum 4 decisions in window before VLM can act (vlm_min_decisions)
8-second cooldown after state changes prevents rapid flip-flopping (vlm_min_state_duration)
Sliding window only for starting - stopping uses simple consecutive count (VLM_STOP_THRESHOLD=2)

Sliding Window Parameters:

Parameter	Value	Purpose
`vlm_history_window`	45s	How far back to look at VLM decisions
`vlm_min_decisions`	4	Minimum decisions needed before acting
`vlm_start_agreement`	90%	Agreement threshold to start blocking
`vlm_hysteresis_boost`	10%	Extra agreement needed to change state
`vlm_start_threshold_cap`	95%	Maximum effective start threshold (so hysteresis can't make it unreachable)
`vlm_min_state_duration`	8s	Cooldown after VLM state change
`VLM_STOP_THRESHOLD`	2	Consecutive no-ad votes for fast-stop path

Transition Frame Detection: When blocking is active, black/solid-color frames are detected as transitions between ads and held in blocking state to prevent premature unblocking and re-blocking flicker. The _is_transition_frame() method analyzes:

Mean brightness < 30 with low std deviation → black screen
Low std deviation across frame → solid color
95% pixels within 20 values of median → uniform/static

Starting Blocking:

OCR detects ad → blocking starts immediately (unless home screen detected)
VLM detects ad (no OCR) → needs 80%+ agreement in sliding window (4+ decisions)
VLM with recent OCR → trusted, triggers blocking
Home screen detection suppresses both OCR and VLM blocking on streaming interfaces

Stopping Blocking:

If OCR triggered (source=ocr or both): OCR says stop (4 no-ads) → ends immediately (~2-3s)
If VLM triggered alone (source=vlm): VLM says stop (2 no-ads) → ends (~4s after ad ends)
VLM history cleared on stop → prevents immediate re-trigger
VLM stop uses simple consecutive count, NOT sliding window (for responsiveness)

Why This Design:

VLM sliding window prevents erratic false-positive blocking when acting alone
OCR is authoritative for stopping OCR-triggered blocks (fast unblock)
VLM-triggered blocks require VLM to confirm ad ended (since OCR never saw it)
Clearing VLM history on stop prevents "waffle memory" from causing re-triggers
VLM stopping uses simple consecutive count (not sliding window) for responsiveness

Anti-flicker:

Minimum blocking duration starts at 3.0s (MIN_BLOCKING_DURATION_BASE) and falls off by MIN_BLOCKING_DURATION_STEP (0.5s) on each consecutive ad: 3.0 → 2.5 → 2.0 → 1.5 → 1.0s. Floor is 1.0s for OCR-only, 1.5s for OCR+VLM both agreeing. Counter resets after MIN_DURATION_RESET_GAP (30s) without a block. Toggleable via Settings → Blocking Optimizations → Block-duration Falloff.
VLM history cleared on stop prevents false re-triggers
Transition frame detection holds blocking through black screens between ads
After TV reconnect, ad blocking is suppressed for HDMI_RECONNECT_GRACE_SECONDS (90s) so the user can navigate without overlays jumping in. The health monitor calls Minus.notify_hdmi_reconnect() when it sees the HDMI-TX link return. Toggleable via Settings → Blocking Optimizations → HDMI Reconnect Grace.

Static Screen Suppression:

Prevents blocking on paused video screens (Netflix/YouTube show ads when paused)
After 2.5s of static screen (STATIC_TIME_THRESHOLD), blocking is suppressed
When video resumes, 0.5s cooldown (DYNAMIC_COOLDOWN) before re-enabling blocking
Detection state (OCR/VLM) cleared on cooldown complete to prevent false positives
Static ad screenshots saved to screenshots/static/ for analysis

OCR Timestamp Pattern Handling: OCR frequently misreads characters in ad timestamps. The detection handles these common confusions:

Intended	OCR Misreads	Example
`0` (zero)	`o`, `O`	"Ad0:30" → "Ado:30", "AdO:30"
`1` (one)	`l`, `L`, `I`, `i`	"Ad1:30" → "Adl:30", "AdI:30"
`:` (colon)	`;`, `.`	"Ad0:30" → "Ad0;30", "Ad0.30"

Combined misreads are also handled (e.g., "Adl;lo" for "Ad1:10"). The timestamp pattern matches:

Standard: Ad 0:30, Ad0:30, Ad1:45
Zero misreads: Ado:30, Ad0:3o, Ado:oo, Ado:o5 (zeros misread on both sides of the colon)
One misreads: Adl:30, Ad1:l5, Adl:lo
Separator misreads: Ad0;30, Ad0.30, Ado;3o

The pattern lives in two places that must stay in sync: src/ocr.py:595 (PaddleOCR class) and src/ocr_worker.py:404 (OCRProcess, which is what production actually calls — self.ocr = OCRProcess() in minus.py:563). Each side carries a Mirrors src/ocr.py:NNN — keep in sync / vice-versa comment. The deeper fix is to delete the duplicate in ocr_worker.py and have it call PaddleOCR.check_ad_keywords directly; until then, any change to one file's pattern must be mirrored to the other. See OCR Worker Keyword-Pattern Drift under Known Issues for the past failure mode.

Ad-keyword policy:

Bare Ad / Ads at a word boundary triggers blocking. Past false positives from words like Loading, reading, Adobe are handled via the word-boundary regex (\bad\b / \bads\b) and the AD_EXCLUSIONS list — bare Ad inside a longer word will not match.
Visit advertiser (YouTube pre-roll CTA) is treated as an exact ad keyword.

Fuzzy "Skip Intro" exclusion: Streaming UIs render a Skip Intro button that OCR sometimes reads as Sk1p Intro, Skip 1ntro, Sk1p 1ntro, Sk1p1ntro, etc. (i ↔ 1 ↔ l ↔ I swaps). A compiled regex s[kK][i1lI]p\s*[i1lI]ntro (in src/ocr.py as SKIP_INTRO_FUZZY_RE and mirrored in src/ocr_worker.py) covers all permutations. It's applied as part of the exclusion gate at the top of the per-text and cross-element matching paths, before either exact-keyword or word-boundary detection runs — important because skip in (inside AD_KEYWORDS_EXACT) is a substring of skip intro and would otherwise match first.

Skip Ad is not excluded — it still triggers ad detection (via the skip ad exact keyword) and is independently recognized as a skip button by src/skip_detection.py, so Minus will press it to dismiss the ad.

Blocking Overlay

When ads are detected, the screen shows a full blocking overlay rendered at 60fps via ustreamer's native MPP blocking mode:

Pixelated Background: Blurred/pixelated version of the screen from ~6 seconds before the ad
Header (debug only): [ BLOCKING // OCR ], [ BLOCKING // VLM ], or [ BLOCKING // OCR+VLM ]
Spanish vocabulary: Random intermediate-level word with translation
Example sentence: Shows the word in context
Rotation: New vocabulary every 11-15 seconds
Ad Preview Window: Live preview of the blocked ad in bottom-right corner (60fps!)
Debug stats (debug only): bottom-left dashboard with uptime, blocks, time saved, ad countdown bar, audio level
OCR trigger snippet (debug only): top-right (Ad) 0:30 left style — the OCR text that fired the block, with the matched keyword wrapped in parens. Empty for VLM-only blocks. Capped at 50 chars.

Multi-color Text Per Line:

Purple - Spanish word (IBM Plex Mono Bold font)
White - Header and translation (DejaVu Sans Bold font)
Gray - Pronunciation and example sentence (DejaVu Sans Bold font)

Font Configuration:

FONT_PATH_VOCAB_PRIMARY = DejaVu Sans Bold (vocabulary text, centered)
FONT_PATH_WORD_PRIMARY = IBM Plex Mono Bold (Spanish word, purple)
FONT_PATH_STATS_PRIMARY = IBM Plex Mono Regular (debug stats, monospace)

Rendering Pipeline: All overlay rendering is done inside ustreamer's MPP encoder, NOT GStreamer:

ad_blocker.py captures pre-ad frame and creates pixelated NV12 background
Background uploaded via POST /blocking/background (async, non-blocking)
Text and preview configured via GET /blocking/set
FreeType renders TrueType fonts directly to NV12 planes at encoder resolution
Composite runs at 60fps with ~0.5ms overhead per frame

Pixelated Background: Instead of a plain black background, the blocking overlay shows a heavily pixelated (20x downscale) and darkened (60% brightness) version of what was on screen before the ad appeared. This provides visual context while clearly indicating blocking is active.

Implementation (src/ad_blocker.py):

Rolling 6-second snapshot buffer (3 frames at 2-second intervals)
Uses oldest frame when blocking starts (ensures pre-ad content)
OpenCV pixelation: downscale by 20x, upscale with INTER_NEAREST
Converted to NV12 and uploaded via /blocking/background POST API
Upload runs in background thread for non-blocking operation

Preview Window: Unlike the old GStreamer approach (limited to ~4fps), the ustreamer blocking mode provides:

Full 60fps live preview of the blocked ad
Hardware-accelerated scaling in the MPP encoder
Automatic resolution handling (works at 1080p, 2K, 4K)

Web UI Toggles: Ad Preview Window and Debug toggleable via Settings (both default ON). The unified Debug toggle controls all three on-screen debug elements together — header, bottom-left stats dashboard, and top-right OCR trigger snippet — and is persisted to ~/.minus_system_settings.json (debug_overlay) so off survives a service restart.

Recursion safety for the OCR snippet: OCR consumes /snapshot/raw (src/capture.py:134), which the patched ustreamer serves from us_blocking_store_raw_frame() before the blocking composite is applied. The new top-right text — and every other element on the blocking overlay — is therefore invisible to OCR, so the displayed (Ad) 0:30 left cannot loop back into detection. Don't break this: if you ever route OCR through /snapshot (the composited path), all of these debug texts will become self-triggering.

Spanish Vocabulary

120+ intermediate-level words and phrases including:

Common verbs: aprovechar, lograr, desarrollar, destacar, enfrentar...
Reflexive verbs: comprometerse, enterarse, arrepentirse, darse cuenta...
Adjectives: disponible, imprescindible, agotado, capaz, dispuesto...
Nouns: desarrollo, comportamiento, conocimiento, ambiente, herramienta...
Expressions: sin embargo, a pesar de, de repente, hoy en dia, cada vez mas...
False friends: embarazada, exito, sensible, libreria, asistir...
Subjunctive triggers: es importante que, espero que, dudo que, ojala...
Time expressions: hace poco, dentro de poco, a la larga, de antemano...

Housekeeping

Log File:

Location: /tmp/minus.log
Max 5MB per log file
Keeps 3 backup files (minus.log.1, .2, .3)

Screenshot Truncation:

Keeps only last 50 screenshots by default
Configurable via --max-screenshots

VLM Model

FastVLM-1.5B on Axera LLM 8850 NPU:

Smarter than 0.5B with fewer false positives on streaming interfaces
~0.7s inference time for ad detection (process-based with 1.5s hard timeout)
~1.0s for custom queries (structured prompt)
~25s model load time (includes 2 warmup inferences)
Uses Python axengine + transformers tokenizer
Home screen detection provides additional safety net

Process-based architecture (src/vlm_worker.py):

VLM runs in a separate process for hard timeout capability
Uses 'spawn' multiprocessing method to avoid "can only join a child process" errors from axengine
Soft/Hard timeout strategy to avoid unnecessary restarts:
- Soft timeout (1.5s): Returns immediately with "TIMEOUT", but worker keeps running
- Hard timeout (5.0s): Only kills worker if inference is truly stuck
- Restart threshold: 3 consecutive soft timeouts trigger a hard kill
- Late responses are drained on next request and counters reset
4 warmup inferences at startup with varied content (noise, gradients, edges, mixed)
Keepalive thread runs dummy inference every 20s during idle to prevent NPU cold-start
Worker process loads model once (~27s), processes requests via Queue

Two inference modes:

detect_ad(image_path) → (is_ad, response_text, elapsed, confidence) — ad/not-ad classification. Internally hard-caps the model at max_new_tokens=5.
query_image(image_path, prompt, max_new_tokens=8) → (response_text, elapsed) — custom prompt for any question about the image (used by Autonomous Mode for screen state classification). The max_new_tokens default of 8 fits the autonomous-mode multi-choice prompt (PLAYING / PAUSED / DIALOG / MENU / SCREENSAVER); raise it explicitly for open-ended prompts knowing latency rises ~0.23 s per allowed token.

Both modes share the same model. Concurrent callers (detection loop calling detect_ad, autonomous mode calling query_image) are serialized by VLMProcess._call_lock so they cannot cross responses on the shared queue or race on the timeout / latency state. See VLMProcess Cross-Thread Race under Known Issues for the full rationale.

The max_new_tokens cap is the load-bearing reason VLM never enters a sustained "restart cycle" anymore. Without it, certain images (visually busy / ambiguous) make the model emit a 30–60 token descriptive paragraph instead of "Yes." / "No.", taking 10–15 s and tripping every downstream timeout. See docs/VLM_NPU_DEGRADATION.md for the investigation that ruled out NPU/firmware/driver causes and isolated the fix.

/home/radxa/axera_models/FastVLM-1.5B/
├── fastvlm_ax650_context_1k_prefill_640_int4/  # LLM decoder models
│   ├── image_encoder_512x512.axmodel           # Vision encoder
│   ├── llava_qwen2_p128_l*.axmodel             # 28 decoder layers
│   └── model.embed_tokens.weight.npy           # Embeddings (float32)
├── fastvlm_tokenizer/                           # Tokenizer files
└── utils/                                       # LlavaConfig and InferManager

Why FastVLM-1.5B instead of 0.5B?

Aspect	FastVLM-0.5B	FastVLM-1.5B
Inference Time	0.7s	0.9s
False Positive Rate	~88% on home screens	~36% on home screens
Intelligence	Basic	Much smarter
Parameters	0.5B	1.5B

Latency-based auto-recovery

The Axera NPU can drift into a degraded state (observed: ~15–18s inference with descriptive responses instead of the structured short answer) that outlasts simple worker restarts. This is not thermal — temps are similar (~70°C) when healthy and when slow. Most likely accumulated NPU memory or axengine context state.

VLMProcess keeps a rolling window of the last 10 successful inference latencies. After each success it computes P95 and triggers recovery if P95 > 3.0s:

Step	Trigger	Action
1	P95 > 3.0s, no recovery in last 60s	`restart()` — kill worker + 2s NPU-release + start
2	Still degraded within 180s of step 1	Deep restart — kill + 8s release + start

60s cooldown prevents thrashing. Latency window and recoveries surface on /api/health at subsystems.vlm.latency and as Prometheus gauge minus_axera_temperature_celsius / minus_axera_npu_usage_percent / minus_axera_cmm_used_kib.

Query axcl directly for live telemetry:

axcl-smi info --temp   # milli-°C, divide by 1000
axcl-smi info --npu    # utilization %
axcl-smi info --cmm    # CMM memory used / total

Dependencies

# System packages
sudo apt install -y imagemagick ffmpeg curl v4l-utils

# GStreamer and plugins for video pipeline
sudo apt install -y \
  gstreamer1.0-tools \
  gstreamer1.0-plugins-base \
  gstreamer1.0-plugins-good \
  gstreamer1.0-plugins-bad \
  gstreamer1.0-rockchip1 \
  gir1.2-gst-plugins-base-1.0 \
  libgstreamer1.0-dev

# Build ustreamer with MPP hardware encoding and FreeType fonts
sudo apt install -y librockchip-mpp-dev libfreetype-dev libjpeg-dev libevent-dev
git clone https://github.com/garagehq/ustreamer.git /home/radxa/ustreamer-garagehq
cd /home/radxa/ustreamer-garagehq && make WITH_MPP=1
cp ustreamer /home/radxa/ustreamer-patched

# Fonts for blocking overlay
sudo apt install -y fonts-dejavu-core fonts-ibm-plex

# Python dependencies
pip3 install --break-system-packages \
  pyclipper shapely numpy opencv-python \
  pexpect PyGObject flask requests androidtv \
  rknnlite  # RKNN NPU runtime for OCR (may need Rockchip's pip repo)

Note: The rknnlite package is provided by Rockchip and may need to be installed from their SDK or a custom repository. On the Radxa board with NPU support, it may already be pre-installed.

Axera NPU (for VLM): The FastVLM-1.5B model runs on the Axera LLM 8850 NPU. Required Python packages:

pip3 install --break-system-packages axengine transformers ml_dtypes

The axengine package requires the Axera AXCL runtime to be installed - see the Axera documentation.

Troubleshooting

ustreamer fails to start:

fuser -k /dev/video0  # Kill processes using device
pkill -9 ustreamer    # Kill orphaned ustreamer

VLM not loading:

Check Axera card: axcl_smi
Verify model files exist in /home/radxa/axera_models/FastVLM-1.5B/
Ensure Python dependencies: pip3 show axengine transformers ml_dtypes

OCR not detecting:

Test snapshot: curl http://localhost:9090/snapshot -o test.jpg
Check HDMI: v4l2-ctl -d /dev/video0 --query-dv-timings

Display issues:

Check DRM plane: modetest -M rockchip -p | grep -A5 "plane\[72\]"
Verify connector: modetest -M rockchip -c | grep HDMI

CRITICAL: Blocking Mode Architecture

NEVER REVERT TO GSTREAMER TEXTOVERLAY FOR BLOCKING OVERLAYS.

The blocking overlay system uses ustreamer's native MPP blocking mode (/blocking/* API), NOT GStreamer's input-selector or textoverlay. This is a one-way migration - we only move forward.

Current Architecture:

Simple GStreamer pipeline with queue max-size-buffers=3 leaky=downstream for smooth video
All blocking compositing (background, preview, text) done in ustreamer's MPP encoder at 60fps
Control via HTTP API: /blocking/set, /blocking/background
FreeType TrueType font rendering:
- IBM Plex Mono Bold for Spanish word (purple, centered)
- DejaVu Sans Bold for vocabulary text (white/gray, centered)
- IBM Plex Mono Regular for stats dashboard (bottom-left, monospace)
Per-line multi-color text matching web UI aesthetic (see AESTHETICS.md)
Thread-safe with mutex protection for 4 parallel MPP encoder workers

Resolution Flexibility: The blocking system automatically handles resolution mismatches:

API calls may specify 4K dimensions (3840x2160)
With --encode-scale passthrough, encoder uses source resolution directly
Preview dimensions are scaled proportionally to fit
Positions are clamped to valid ranges
All coordinates aligned to even values for NV12

Thread Safety: FreeType is NOT thread-safe. With 4 parallel MPP encoder workers, a pthread_mutex_t _ft_mutex serializes all FreeType calls in the composite function to prevent crashes. Without this, concurrent FT_Set_Pixel_Sizes/FT_Load_Glyph calls corrupt FreeType's internal state.

Why NOT GStreamer textoverlay:

Caused pipeline stalls every ~12 seconds
NV12 format incompatibility issues
4K→1080p resolution mismatch problems
gdkpixbufoverlay limited to ~4fps for preview updates
Complex input-selector switching logic

Key files:

ustreamer-garagehq/src/libs/blocking.c - NV12 compositing with FreeType, mutex protection
ustreamer-garagehq/src/libs/blocking.h - Blocking mode API
src/ad_blocker.py - Python client using blocking API

ustreamer Overlay and Blocking API

Notification Overlay (for Fire TV setup messages, etc.):

GET /overlay - Get current overlay configuration
GET /overlay/set?params - Set overlay configuration

Parameter	Description
`text`	Text to display (URL-encoded, supports newlines)
`enabled`	`true` or `1` to enable overlay
`position`	0=top-left, 1=top-right, 2=bottom-left, 3=bottom-right, 4=center
`scale`	Text scale factor (1-10)
`color_y`, `color_u`, `color_v`	Text color in YUV
`bg_enabled`	Enable background box
`bg_alpha`	Background transparency (0-255)
`clear`	Clear overlay

Example:

curl "http://localhost:9090/overlay/set?text=LIVE&position=1&scale=3&enabled=true"
curl "http://localhost:9090/overlay/set?clear=true"

Blocking Mode (for ad blocking overlays):

Blocking Mode Endpoints:

GET /blocking - Get current config (enabled, preview, colors, etc.)
GET /blocking/set?enabled=true&text_vocab=...&text_ocr=...&preview_enabled=true&preview_grayscale=true&word_y=140&word_u=175&word_v=145 - Configure. Includes preview_grayscale to desaturate the corner preview, word_y/word_u/word_v for cycling the Spanish word color per rotation, and text_ocr for the top-right OCR-trigger snippet (renders in IBM Plex Mono Regular at the same scale as text_stats; empty string clears it).
POST /blocking/background - Upload pixelated NV12 background (widthheight1.5 bytes)

Multi-color text auto-detection: Lines starting with [ → white (header), ( → gray (pronunciation), = → white (translation), " → gray (example), other → purple (Spanish word)

Overlay Priority System

The overlay system includes a priority mechanism to handle multiple overlays gracefully:

Persistent Overlays:

Setup instructions (Roku Limited Mode, Fire TV ADB Enable) are "persistent"
Registered with duration > 60 seconds
Have a background monitor thread that checks every 5 seconds
Auto-restore if overwritten by short notifications (VLM status, etc.)

Short Overlays:

Status notifications (VLM Ready, Connected, etc.) are short-lived (5-10s)
Can temporarily interrupt persistent overlays
After they expire, the persistent overlay is automatically restored

State Changes:

Successful device connection calls _clear_persistent() to dismiss setup instructions
This prevents stale setup overlays from reappearing after connection

Implementation:

Module-level singleton state in src/overlay.py (_overlay_state dict)
Monitor thread spawned by _set_persistent() polls ustreamer overlay API
Compares current overlay text to expected text, restores if different

Health Monitoring

The health monitor (src/health.py) runs in a background thread and checks:

Subsystem	Check	Recovery
HDMI signal	v4l2-ctl --query-dv-timings	Show "NO SIGNAL" overlay, mute audio
No HDMI at startup	check_hdmi_signal()	Show bouncing "NO SIGNAL" screensaver
ustreamer	HTTP HEAD to :9090/snapshot	Restart ustreamer + video pipeline
Video pipeline	Buffer flow + FPS monitoring	Restart pipeline with exponential backoff
Output FPS	GStreamer pad probe	Log warning if < 25fps
VLM	Consecutive timeouts < 5	Degrade to OCR-only, retry VLM after 30s
Memory	Usage < 90%	Force GC, clear frame buffers
Disk	Free > 500MB	Log warning

HDMI Disconnect/Reconnect Recovery:

Detects HDMI signal loss via ustreamer's /state API (captured_fps field)
Signal considered lost if FPS is 0 for more than 5 seconds (handles source going to sleep)
Shows "NO SIGNAL" overlay and mutes audio immediately
On signal restoration: restarts ustreamer → restarts video pipeline → restores display
Full recovery typically completes in ~7 seconds

Display Output Resilience (HDMI-TX Disconnected):

Service continues running even if HDMI-TX display output is disconnected
ustreamer runs independently for web preview and ML detection
Web UI shows "DISPLAY DISCONNECTED" overlay with grayscale video feed underneath
Display retry loop attempts reconnection every 7 seconds (only display pipeline, not ustreamer)
OCR/VLM ad detection continues working without display output
API exposes display_connected and display_error fields in /api/status

Video Pipeline Watchdog:

Buffer watchdog detects stalls (10 seconds without buffer)
Monitors GStreamer pipeline state (must be PLAYING)
Handles HTTP connection errors from souphttpsrc
Handles unexpected EOS (end-of-stream) events
Exponential backoff for restarts (1s → 2s → 4s → ... → 30s max)
Backoff resets after 10 seconds of sustained buffer flow

Startup grace period:

30-second grace period before ustreamer health checks begin
Prevents false positives during VLM model loading

Graceful degradation (startup):

OCR initialization: 3 retries with 2s delay, continues without OCR if all fail
VLM model loading: 3 retries with 5s delay, continues without VLM if all fail
Both failures are non-fatal — the system runs with whatever subsystems loaded
ocr_ready and ocr_disabled fields in /api/status (matching existing vlm_ready/vlm_disabled)
OCR status badge in web UI header: OCR: Ready / Disabled / Failed

Graceful degradation (runtime):

If VLM fails 5+ times consecutively, switches to OCR-only mode
VLM restart is attempted after 30 seconds in background
OCR continues working independently

Scene skip cap:

OCR: Force run after 30 consecutive skips
VLM: Force run after 10 consecutive skips
Prevents missing ads that appear without scene change

Periodic logging:

FPS logged every 60 seconds
Full status logged every 5 minutes (uptime, fps, hdmi, video, audio, vlm, mem, disk)

Web UI

Minus includes a lightweight Flask-based web UI for remote monitoring and control, accessible via Tailscale from desktop or mobile devices.

Features:

Live video feed - MJPEG stream proxied from ustreamer (CORS bypass)
Status display - Blocking state, FPS, HDMI info, uptime
Pause controls - 1/2/5/10 minute presets to pause ad blocking
Detection history - Recent OCR/VLM detections with timestamps
Settings - Toggle preview window and debug dashboard
Log viewer - Collapsible log output for debugging

Key API Routes:

GET /, /api/status, /api/detections, /api/logs
POST /api/pause/N, /api/resume
GET/POST /api/preview/*, /api/debug-overlay/* (the debug-overlay route is the unified Debug toggle: header + bottom-left stats + top-right OCR snippet, persisted to ~/.minus_system_settings.json as debug_overlay)
POST /api/test/trigger-block, /api/test/stop-block
GET /stream, /snapshot - Proxy to ustreamer
GET /api/health - Health check for uptime monitors
POST /api/video/restart - Force restart video pipeline
GET/POST /api/video/color - Get/set color settings (saturation, brightness, contrast, hue)
POST /api/ocr/test - Run OCR on current frame (no screenshot save)
POST /api/vlm/test - Run VLM on current frame (no screenshot save)
GET /api/vlm/status - Get VLM status (disabled, model_loaded, etc.)
POST /api/vlm/disable - Disable VLM and unload model from NPU
POST /api/vlm/enable - Re-enable VLM and load model
POST /api/blocking/skip - Trigger Fire TV skip button
POST /api/audio/sync-reset - Reset A/V sync (~300ms dropout)
GET /api/autonomous - Autonomous mode status
POST /api/autonomous/enable / disable / toggle / start - Control autonomous mode
POST /api/autonomous/schedule - Set schedule (start_hour, end_hour, always_on)
GET /api/autonomous/logs - Autonomous mode log entries
GET /api/screenshots/review/<category> - Unreviewed screenshots for swipe classification
POST /api/screenshots/approve - Mark screenshot as correctly labeled
POST /api/screenshots/classify - Move screenshot between categories
POST /api/screenshots/undo - Undo last review action
GET /api/ir/status - IR transmitter status (enabled, available, initialized, codes)
POST /api/ir/enable / disable - Toggle the IR remote feature (gates the UI and /command)
POST /api/ir/command - Send a captured button. Body: {"button": "power"|"input_1"|"input_2"|"input_3"|"next"|"auto"}. 403 when disabled, 429 with retry_after inside the 1.5 s cooldown. See docs/IR_TRANSMITTER.md.
GET /api/leds/status - Status LEDs status (available, enabled, running, state, states, last_error, gated)
POST /api/leds/enable / disable - Toggle the WS2812B status strip; persists; starts/stops the animation thread
POST /api/leds/state - Switch animation state. Body: {"state": "<name>"}. 403 when disabled, 400 for unknown state. States: off / initializing / idle / blocking / paused / no_signal / autonomous / wifi_setup / error. See docs/STATUS_LEDS.md.
GET /api/leds/require_display - Display-gate status (leds_require_display, live display_connected)
POST /api/leds/require_display - Body {"enabled": true|false} — when on (default), the strip stays dark while the HDMI-TX display is disconnected or powered off.

Test API Endpoints: For development and testing ad blocking without waiting for real ads:

# Trigger blocking for 20 seconds (max 60)
curl -X POST -H "Content-Type: application/json" \
  -d '{"duration": 20, "source": "ocr"}' \
  http://localhost:80/api/test/trigger-block

# Stop blocking immediately
curl -X POST http://localhost:80/api/test/stop-block

Parameters for trigger-block:

duration: seconds to block (default: 10, max: 60)
source: detection source - 'ocr', 'vlm', 'both', or 'default'
kind: optional forced replacement kind - 'vocab', 'fact', or 'photos'

Test mode prevents the detection loop from canceling the blocking, allowing full testing of pixelated background, animations, and audio muting. When source is ocr or both, the endpoint also injects a synthetic (Ad) 0:30 left snippet into the top-right OCR-trigger slot so you can exercise that rendering path without waiting for real OCR.

Access URLs:

Local: http://localhost:80
Tailscale: http://<tailscale-hostname>:80
Direct stream: http://<hostname>:9090/stream

Security:

No authentication (relies on Tailscale network security)
Read-mostly API with minimal attack surface
Binds to 0.0.0.0 for remote access

VLM Training Data Collection

Minus automatically collects training data for future VLM improvements, organized by type:

Screenshot directories:

screenshots/ads/ - OCR-detected ads
screenshots/non_ads/ - User paused = false positives
screenshots/vlm_spastic/ - VLM uncertainty cases (detected 2-5x then changed)
screenshots/static/ - Static screen suppression

Screenshot Quality Filtering (all categories):

Every save goes through _should_save() which applies three layers of filtering:

Layer	What it catches	Threshold
Rate limiting	Rapid-fire saves	5s minimum between saves per category
Blank rejection	Black/solid-color frames	Mean brightness < 15 or std dev < 10
dHash dedup	Near-duplicate frames	Hamming distance < 10 bits (~85% similar)

dHash (Difference Hash):

Resize frame to 9x8 grayscale, compare adjacent pixels → 64-bit hash
Two frames of the same ad with slightly different timestamps: hamming distance ~1-5
A genuinely different scene: hamming distance ~20-30
Keeps last 200 hashes per category for rolling dedup window

Screenshot Review System (Tinder-style):

The web UI includes a swipe-based review system for classifying screenshots:

Each screenshot tab (Ads, Non-Ads, VLM Spastic, Static) has a 👀 review button
Opens a full-screen modal with a 3-card visual stack
Swipe right (or arrow key) = approve / classify as ad
Swipe left (or arrow key) = reclassify / classify as not ad
Undo (Ctrl+Z or button) reverses the last action
Progress tracked in /home/radxa/.minus_reviewed_screenshots.json — shows oldest unreviewed first

Category	Swipe Right	Swipe Left
Ads	Approved (correct)	Move to Non-Ads
Non-Ads	Approved (correct)	Move to Ads
VLM Spastic	Move to Ads	Move to Non-Ads
Static	Move to Ads	Move to Non-Ads

Review API:

GET /api/screenshots/review/<category> - Unreviewed items, oldest first
POST /api/screenshots/approve - Mark as correctly labeled
POST /api/screenshots/classify - Move between categories
POST /api/screenshots/undo - Undo last action

Autonomous Mode

Autonomous Mode keeps YouTube playing on streaming devices during scheduled hours so Minus can collect ad detection training data unattended. Device-agnostic design supports Fire TV, Roku, and Google TV. Uses VLM to understand screen state and take intelligent actions.

How it works:

Schedule — Configurable start/end hours (e.g., 22:00–06:00), or 24/7 mode
OCR-based screen detection — Before VLM, checks OCR text for login/home screen keywords (VLM often misclassifies these static screens as "PLAYING")
VLM-guided keepalive — Every 2 minutes, captures a frame and asks VLM to classify the screen state
Roku ECP active app check — Before VLM, queries Roku's /query/active-app API to detect if YouTube exited or screensaver activated (more reliable than VLM for Roku)
Frame-change + audio verification — After VLM says "PLAYING", verifies with dHash frame comparison + audio flow check to catch paused videos VLM misclassifies
Smart actions — Based on combined signals, takes the minimum necessary action:

Signal	Action	Command
OCR: login screen keywords	Select account	`down` + `select`
OCR: home screen keywords + static	Select video	`down` + `select`
VLM: PLAYING + frames changing	None	Video is fine
VLM: PLAYING + static + no audio	Play	`play_pause` (paused video VLM missed)
VLM: PLAYING + static + audio flowing	None	Music stream with static image (lo-fi)
VLM: PAUSED	Play	`play_pause` key
VLM: DIALOG	Dismiss	`select` + `play_pause`
VLM: MENU	Select video	`down` + `select`
VLM: SCREENSAVER	Wake + launch	`wakeup` + launch YouTube
Roku: screensaver overlay	Dismiss	`select` (wake from screensaver)
Roku: not on YouTube	Relaunch	`launch_app('youtube')`

Device-agnostic design:

set_device_controller(controller, device_type) accepts any controller
Device type auto-detected from controller class name
YouTube launch uses device-specific methods (Roku ECP launch_app, Android ADB intent)
Skip command routes through active device controller

Roku-specific features:

Active app check via ECP /query/active-app — definitively knows if YouTube is running
Screensaver detection — checks for <screensaver> element in active-app response (Roku City screensaver overlays YouTube without closing it)
YouTube app ID: 837

OCR-based screen detection: VLM often misclassifies static YouTube screens (login, home) as "PLAYING". OCR keywords provide more reliable detection:

Screen	Keywords	Action
Login/account selection	`watch as guest`, `watchas guest`, `add a kid account`, `kid account`, `choose account`, `switch account`	`down` + `select` to choose account
Home/browse	`new to you`, `newtoyou`, `trending`, `subscriptions`, `library`, `views`, `year ago`, `month ago`	`down` + `select` to pick a video

Frame-change verification (pause detection):

dHash (difference hash) compares two frames 3 seconds apart
Hamming distance < 3 = truly static (paused or stuck)
Audio flow check via ad_blocker's audio module (0 <= last_buffer_age < 3s) or ALSA /proc/asound status
Note: buffer_age = -1 means no audio ever received (not flowing), fixed to prevent false "audio flowing" detection
Static frames + audio flowing = music stream (not paused) — prevents false play_pause
Static frames + no audio = truly paused — sends play_pause after 2 consecutive checks

VLM Screen Query Prompt:

Look at this TV screen and classify it into exactly one category.
Answer with ONLY one of these words:
PLAYING, PAUSED, DIALOG, MENU, SCREENSAVER

This structured prompt returns in ~1.0s (vs 5-22s with descriptive prompts).

Settings persistence: /home/radxa/.minus_autonomous_mode.json

{"enabled": true, "start_hour": 22, "end_hour": 6, "always_on": false}

System settings: /home/radxa/.minus_system_settings.json

{"vlm_preload": true}

VLM preload loads the model at startup before HDMI signal arrives (configurable in Settings tab).

API endpoints:

GET /api/autonomous - Current status (active, schedule, stats, device_type, device_connected)
POST /api/autonomous/enable / disable / toggle
POST /api/autonomous/start - Start immediately (manual override)
POST /api/autonomous/schedule - Set hours and always_on flag
GET /api/autonomous/logs - Recent log entries
GET/POST /api/settings/vlm-preload - VLM preload toggle
GET/POST /api/settings/optimization - Toggle block-duration falloff, HDMI reconnect grace, and greyscale preview. POST body: {"key": "block_falloff"|"hdmi_reconnect_grace"|"greyscale_preview", "enabled": true|false}. Persisted to ~/.minus_system_settings.json. Setting greyscale_preview here propagates to the running ad_blocker immediately via /blocking/set?preview_grayscale=... so the current block updates on the fly.
GET/POST /api/settings/replacement-modes - Which content kinds the blocking overlay rolls into. POST body: {"modes": ["vocab","fact","haiku","photos"]}. Server enforces at least one text kind (vocab/fact/haiku) remains enabled. Persisted with the rest of system settings.
GET/POST /api/media/photos - List all uploaded photos (GET) or upload a new one (POST multipart with file field). Server re-encodes to JPEG (max 1920px long edge, quality 85) under ~/.minus_media/photos/. Count cap 200, size cap 200 MB (oldest evicted on add).
GET/DELETE /api/media/photos/<id> - Download JPEG bytes inline (GET) or remove by id (DELETE). Id is sanitized to hex to prevent path traversal.

Web UI: Toggle button, schedule time selectors, 24/7 checkbox (auto-enables mode), stats display in Settings tab, VLM preload toggle.

24h stability test results (Apr 10-11, 2026):

Memory: stable at ~1.65GB RSS, no leak (tested 21+ hours continuous)
FD count: stable at ~35, no leak
Autonomous actions: 10+ DIALOG dismissals, 6+ screensaver auto-dismissals, all successful
Ads blocked: 15+ ad breaks (OCR+VLM), all legitimate
Audio-aware static detection: prevented 100+ false play_pause commands on lo-fi streams
Audio restarts: 3 total (isolated, all self-recovered)
Zero errors throughout

WiFi Captive Portal

Minus includes a WiFi captive portal system for easy network configuration when no WiFi is connected.

How it works:

If WiFi disconnects for 30+ seconds, Minus creates a "Minus" hotspot AP
Users connect to the hotspot and get redirected to a setup page
Setup page shows available networks with signal strength
User selects network and enters password
Minus connects and stops the AP automatically

Hotspot Configuration:

SSID: Minus
Password: minus123
IP: 10.42.0.1
Band: 2.4GHz (802.11 b/g)

Captive Portal Detection: The portal supports automatic detection on mobile devices:

GET /generate_204 - Android captive portal check
GET /hotspot-detect.html - Apple captive portal check
GET /connecttest.txt - Windows captive portal check

API Endpoints:

GET /api/wifi/status - Current connection status, AP mode state
GET /api/wifi/scan - Scan for available networks
POST /api/wifi/connect - Connect to a network (ssid, password)
POST /api/wifi/disconnect - Disconnect from current network
POST /api/wifi/ap/start - Start AP mode manually
POST /api/wifi/ap/stop - Stop AP mode
GET /wifi-setup - Captive portal setup page

Settings Tab Integration: The Settings tab in the web UI shows:

Current WiFi status (SSID, IP, signal strength)
Disconnect button for current network
Manual AP mode start/stop buttons

Files:

src/wifi_manager.py - WiFi/AP management module
src/templates/wifi_setup.html - Captive portal page
tests/test_wifi_portal.py - Playwright tests (30 tests)

Note: The Radxa's internal WiFi antenna has limited range. For better AP coverage in production, consider using a USB WiFi adapter with external antenna.

IR Transmitter (REI 8K HDMI Switch)

An IR LED wired to Rock Pi 5B header pin 38 (GPIO3_B2 / Linux GPIO 106, muxed to PWM3_IR_M1) lets Minus control a REI 8K 3-port HDMI switch. The target use case is autonomous mode rotating between streaming devices (Roku / Fire TV / Google TV) on a schedule so training data covers multiple home-screen layouts.

Hardware setup (one-time): enable the rk3588-pwm3-m1 overlay, reboot. After reboot a new /sys/class/pwm/pwmchipN appears whose device symlink points to fd8b0030.pwm. See docs/IR_TRANSMITTER.md for overlay install steps and wiring.

Protocol: NEC at 38 kHz carrier. Captured codes (all address 0x80, via Flipper Zero): input_1=0x07, input_2=0x1B, input_3=0x08, power=0x05, next=0x1F (cycles 1→2→3→1), auto=0x09.

API: /api/ir/status | enable | disable | command. See the Web UI Key API Routes section above. Server enforces a 1.5 s cooldown between successful sends (IRCooldownError → HTTP 429 with retry_after).

UI: toggle + 6-button remote (Input 1/2/3, Power, Next, Auto) inside the Autonomous Mode section of the Settings tab. Panel hidden until toggled on. Buttons auto-disable during cooldown and a status line shows sent power or cooldown — wait 0.74s.

Standalone CLI: sudo python3 ir_transmit.py <button> sends one button; --list prints all valid names. Uses the same IRTransmitter class as the webui so there is one source of truth.

Key gotchas (the ones that burned us once already):

The Radxa pinout labels GPIO3_B2 with the RK3588 pin-function PWM3_IR_M1, not PWM14. Only the rk3588-pwm3-m1 overlay wires pin 38.
On a fresh PWM export, polarity defaults to inversed on this chip. That flips mark/space at the LED. IRTransmitter.initialize() sets polarity=normal while the PWM is disabled, before enabling.
Writing to duty_cycle returns EINVAL while period is still 0. Always set period before duty_cycle on a fresh export.

Files:

src/ir_transmitter.py — IRTransmitter class, NEC encoder, cooldown, PWM sysfs wiring
ir_transmit.py — standalone CLI shim
minus.py — instantiates self.ir_transmitter, persists ir_enabled in ~/.minus_system_settings.json
src/webui.py — /api/ir/* endpoints, cooldown → 429
src/templates/index.html — toggle + remote panel in Autonomous Mode section
tests/test_ir_transmitter.py — 20 unit tests (mocked sysfs)
tests/test_ir_ui.py — Playwright UI tests (live service)
docs/IR_TRANSMITTER.md — full hardware, protocol, API, and troubleshooting docs

Future work: hook minus.ir_transmitter.send("next") into autonomous mode's scheduler on a 12 h or 24 h cadence. The boilerplate (flag, endpoints, UI, cooldown) is in place so the autonomous-mode change is a single call site.

IR Receiver (Bench-Tested, Not Wired Into App)

A 3-pin IR receiver (TSOP38238 / VS1838B class) was evaluated on header pin 3 (GPIO4_B3 / gpiochip4 line 11, Linux GPIO 139). Decoded the REI remote's NEC frames cleanly — 0x80 / 0x07,1B,08,1F plus REPEAT codes — using gpiomon + a Python decoder in test_ir_receiver.py. No production code yet, just exploratory.

Why pin 3 instead of pin 38 (alongside the transmitter): the rk3588-pwm3-m1 overlay parks pin 38's pad-mux on PWM3 at boot. gpiomon will claim the line but the GPIO controller is electrically disconnected from the pad — gpioget reads a constant 0 and no edges fire. Pin 3 / GPIO4_B3 has no overlay claiming it, so default GPIO mux applies and it Just Works. Sanity check: gpioget gpiochip4 11 returns 1 with the receiver powered and idle.

Two gotchas burned dev time, captured here so we don't re-discover:

gpiomon -B both is invalid in libgpiod 1.6 — -B is bias, not edge. Default already monitors both edges; pass nothing.
After a falling edge the line is LOW (a MARK), not a SPACE. Get the polarity backwards and every frame appears to start with a ~4500/~600 µs "leader" because the real 9 ms leader mark gets filtered by the empty-buffer guard.

Status: test script only. Decoder is a copy-able starting point if/when we want a real IRReceiver module — see docs/IR_RECEIVER.md for the full sketch including threading model, suggested API surface, and integration ideas (closed-loop transmitter verification, external hardware trigger, remote learning, post-send confirmation for autonomous-mode scheduling).

Files:

test_ir_receiver.py — standalone bench-test script (gpiomon subprocess + NEC decoder + --raw mode for non-NEC remotes)
docs/IR_RECEIVER.md — findings, gotchas, future-module sketch

Status LED Strip (WS2812B on SPI0 MOSI)

8× WS2812B addressable strip on header pin 19 (GPIO1_B2 muxed as SPI0_MOSI_M2). All 8 LEDs are user-addressable.

Why SPI MOSI: WS2812B's 800 kHz protocol needs sub-µs timing. Userspace GPIO can't deliver that on Linux; the SPI controller can. We clock SPI at 6.4 MHz and encode each WS bit as one full SPI byte (0b11110000 = WS-1, 0b11000000 = WS-0) — the canonical Adafruit NeoPixel_SPI pattern. The frame is wrapped with 80 µs zero-byte resets on both sides and sent via writebytes2(bytes). We initially tried 3-SPI-bits-per-WS-bit at 2.4 MHz; the spi-rockchip driver's PIO mode inserts inter-byte gaps when its FIFO refills, and the tighter scheme didn't have enough skew tolerance — visible symptom was "solid green decoded as cycling red/blue/white". rpi_ws281x and friends depend on Broadcom PWM+DMA hardware and don't work on RK3588.

Hardware: bare-wire data line direct from header pin 19 to the strip — no level shifter, no inline resistor, no bulk cap needed for reliable operation on this board (verified: removed both the Adafruit-recommended 470 Ω series resistor and the 1000 µF V+/GND electrolytic, decoding stayed clean across all 8 LEDs). Keep the data wire ≤ 10 cm. (We previously shipped a "sacrificial first pixel" workaround that exposed only 7 LEDs; the encoding switch made it unnecessary and it has been removed.)

Brightness cap (load-bearing): BRIGHTNESS = 0.10 is applied inside set_pixel() before storage; every other setter funnels through it. Caps peak draw at ~48 mA across all 8 LEDs — small enough to keep current swings from corrupting the data line on the marginal 3.3V signalling. Don't bypass it from the application layer; if you need more brightness, add external 5 V power to the strip first.

Controller (src/status_led_controller.py): StatusLEDController runs a 200 ms (5 fps) animation thread. Each renderer self-paces in seconds via the shared _to_ticks() helper, so per-animation cadence is preserved if the global tick rate is changed. State transitions are atomic and thread-safe; the lifecycle holds _thread populated until join() returns so a racing start() can't open a second SPI handle.

State catalogue:

State	Visual	Trigger
`off`	dark	feature disabled
`initializing`	white pulse 1% → 10% → 1% (1 step/500 ms; 14 s/breath)	`Minus.run()` start, HDMI restoration
`idle`	solid green	`ad_blocker.start()`, `ad_blocker.hide()`, recovery complete
`blocking`	bouncing red Cylon eye + 2-pixel tail (~200 ms/step)	`ad_blocker.show(...)`
`paused`	slow yellow breathing (3 s)	`Minus.pause_blocking(...)`
`no_signal`	slow amber breathing (4 s)	`_on_hdmi_lost`, `start_no_signal_mode`
`autonomous`	slow blue breathing (4 s)	autonomous-mode active callback
`wifi_setup`	cyan alternating sweep (~250 ms/swap)	WiFi AP-mode started
`error`	fast red blink (2 Hz)	manual / subsystem failure

Persistence: the on/off toggle is in ~/.minus_status_leds.json. State itself is runtime-only and gets re-asserted by the next event.

Display gating: by default the strip stays dark while the HDMI-TX display is disconnected or powered off — keeps a dark room dark when the TV is off. State machine still ticks; only the wire output is suppressed, so animations resume seamlessly within ~200 ms of the display coming back. Implemented as an optional drive_predicate on the controller that Minus wires to health_monitor._check_hdmi_output_connected(). The leds_require_display flag (default True) toggles the gate from the WebUI; persisted in ~/.minus_system_settings.json.

Hardware setup (one-time): enable rk3588-spi0-m2-cs0-spidev overlay, install python3-spidev, add user to spi group, reboot. ./install.sh does all of that idempotently.

API endpoints: see the Web UI section above.

Files:

src/status_leds.py — raw SPI driver, brightness cap, encoding
src/status_led_controller.py — state machine + animation thread + persistence
minus.py — instantiates self.status_leds, wires _set_led_state helper, hooks _on_hdmi_lost / _on_hdmi_restored
src/ad_blocker.py — calls _set_led_state from show()/hide()/start()/start_no_signal_mode()
src/webui.py — /api/leds/* endpoints
src/templates/index.html — toggle + state palette in Autonomous Mode section
test_status_leds.py — hardware walk/flash test
tests/test_status_led_controller.py — 26 unit tests (mocked hardware)
tests/test_status_leds_ui.py — Playwright UI tests (live service)
docs/STATUS_LEDS.md — full docs

Future work: per-LED subsystem indicators (OCR / VLM / audio / HDMI / wifi / autonomous each get one LED), one-shot detection-event flashes, automatic autonomous state on autonomous-mode entry/exit.

Streaming Device Configuration

Minus supports multiple streaming device types with device-specific remote control:

Supported Devices:

Device	Protocol	Status
Fire TV	ADB over WiFi	Full support
Roku	ECP (External Control Protocol)	Full support
Google TV / Android TV	ADB over WiFi	Full support
Apple TV	MRP/AirPlay	Coming soon
Generic	None	Ad blocking only

Web UI Setup: The Remote tab provides a device selector where users can:

Select their streaming device type
Follow device-specific setup instructions
Scan for devices on the network (Fire TV, Roku, Google TV)
Manually enter device IP address
Connect and control their device

Device Configuration Persistence:

Configuration stored in ~/.minus_device_config.json
Persists device type, IP address, and setup state
Survives service restarts

API Endpoints:

GET /api/device/config - Get current configuration
GET /api/device/types - List available device types
POST /api/device/select - Select a device type
POST /api/device/ip - Set device IP address
POST /api/device/setup-complete - Mark setup complete
POST /api/device/reset - Reset configuration

Roku API Endpoints:

GET /api/roku/status - Connection status and device info
GET /api/roku/discover - Scan network via SSDP multicast
POST /api/roku/connect - Connect to Roku by IP
POST /api/roku/command - Send remote command
POST /api/roku/launch/<app> - Launch app (youtube, netflix, etc.)

Roku Features:

Discovery via SSDP multicast
ECP commands over HTTP to port 8060
Control mode detection (Limited vs Full)
Supports all navigation, media, and volume controls
App launching: YouTube, Netflix, Prime, Disney+, Hulu, Plex, HBO, Peacock

Fire TV API Endpoints:

GET /api/firetv/status - Connection status
GET /api/firetv/scan - Scan network for Fire TV devices
POST /api/firetv/connect - Connect to Fire TV by IP
POST /api/firetv/command - Send remote command

Google TV / Android TV API Endpoints:

GET /api/googletv/status - Connection status
GET /api/googletv/scan - Scan network for devices (port 5555)
POST /api/googletv/connect - Connect by IP:PORT (Wireless debugging uses dynamic port)
POST /api/googletv/command - Send remote command (includes assistant for Google Assistant)

Google TV Setup Notes:

Uses "Wireless debugging" (not USB debugging) for network ADB
Settings > System > Developer options > Wireless debugging
Shows IP:PORT on TV screen when enabled (e.g., 192.168.1.100:37421)
Enter the full IP:PORT in web UI Remote tab
First connection requires approving the ADB dialog on TV

Fire TV Remote Control

Minus can control Fire TV devices over WiFi via ADB for ad skipping and playback control.

Auto-setup: Fire TV is automatically discovered and connected 5 seconds after Minus starts. First-time connection requires approving the ADB authorization dialog on the TV screen (OCR detects when it appears). ADB keys are saved for future connections.

Features:

Auto-discovery of Fire TV devices on local network
Verification that discovered device is actually a Fire TV
ADB key generation and persistent storage for pairing
Auto-reconnect on connection drops
Full remote control: play, pause, select, back, d-pad, etc.
Async-compatible interface

Requirements:

Fire TV must have ADB debugging enabled
First connection requires approving RSA key on TV screen
Both devices must be on the same WiFi network

Enabling ADB on Fire TV: Settings > My Fire TV > Developer Options > ADB Debugging ON (enable Dev Options first via About > click device name 7x)

Testing: python3 test_fire_tv.py [--setup|--interactive|--scan|IP]

Commands: Navigation (up/down/left/right/select/back/home), Media (play/pause), Volume, Power

Usage: quick_connect() → skip_ad() / go_back() → disconnect()

Setup States: idle → scanning → waiting_adb_enable → waiting_auth → connected

Google TV / Android TV Remote Control

Minus can control Google TV and Android TV devices over WiFi via ADB's Wireless debugging feature.

Setup Flow:

Select "Google TV / Android TV" in the Remote tab
On-screen overlay guides you through enabling Wireless debugging
Enter the IP:PORT shown on your TV's Wireless debugging screen
Approve the connection dialog on your TV

Key Differences from Fire TV:

Uses "Wireless debugging" instead of "ADB debugging" (USB debugging)
Dynamic port (not fixed 5555) - must enter IP:PORT format
Found in Developer options after enabling developer mode

Enabling Wireless Debugging:

Settings > System > About > click Build number 7 times
Go back to System > Developer options
Turn ON "Wireless debugging"
Note the IP address and port shown on screen

Commands: Same as Fire TV plus assistant for Google Assistant button

Setup States: Same as Fire TV: idle → scanning → waiting_adb_enable → waiting_auth → connected

Color Correction

Color correction is done via GStreamer's videobalance element in the pipeline.

Why not ustreamer/V4L2? The HDMI-RX device doesn't support V4L2 image controls (saturation, contrast, brightness). Only read-only controls are available: audio_sampling_rate, audio_present, power_present.

Default settings (in src/ad_blocker.py):

videobalance saturation=1.25 brightness=0.0 contrast=1.0 hue=0.0

Web UI Controls: Color settings can be adjusted in real-time via the Settings tab in the web UI:

Saturation: 0.5-1.5 slider (default 1.25, higher = more vivid)
Brightness: -0.5 to 0.5 slider (default 0.0)
Contrast: 0.5-1.5 slider (default 1.0)
Hue: -0.5 to 0.5 slider (default 0.0)

API Endpoints:

# Get current color settings
curl http://localhost/api/video/color

# Set color settings (any combination)
curl -X POST -H "Content-Type: application/json" \
  -d '{"saturation": 1.3, "brightness": 0.1}' \
  http://localhost/api/video/color

GStreamer ranges (for advanced use):

saturation: 0.0-2.0 (default 1.0)
contrast: 0.0-2.0 (default 1.0)
brightness: -1.0 to 1.0 (default 0.0)
hue: -1.0 to 1.0 (default 0.0)

Running as a Service

# Install
sudo ./install.sh

# View logs
journalctl -u minus -f

# Stop
sudo systemctl stop minus
./stop.sh  # Alternative with optional X11 restart

# Uninstall
sudo ./uninstall.sh

The service:

Starts on boot (multi-user.target)
Conflicts with display managers (gdm, lightdm, sddm)
Restarts on crash (5 attempts per 5 minutes)
Runs as root for DRM/device access

Development Notes

CRITICAL: Testing and Debugging Methodology

Finding the Root Cause is ESSENTIAL:

Do NOT implement band-aid fixes that mask symptoms without understanding the cause
Investigate WHY something is failing, not just WHAT is failing
Example: If audio restarts constantly, don't just limit restart attempts - find out WHY it's restarting
Use logs, /proc filesystem, API responses, and system state to trace the actual problem
A fix that doesn't address root cause will likely cause other issues or recur

Test Fixes BEFORE Pushing:

After implementing a fix, TEST it immediately by observing actual behavior
Focus testing specifically on the ORIGINAL PROBLEM - verify the symptom is gone
Do NOT push fixes without verification - iterate until the fix demonstrably works
Run prolonged tests (30-60 seconds minimum) to catch intermittent issues
Watch for the specific symptom that was reported (e.g., "frame jumps every 2-3 seconds")

Testing Methodology:

Understand the symptom clearly (what exactly is failing and how often)
Identify potential causes through log analysis and code review
Implement a fix targeting the root cause
Restart the service and observe behavior
Check logs for the specific error patterns that were occurring
Run a prolonged test (60 seconds) watching for the original symptom
Only commit/push after confirming the symptom is resolved

Verification Techniques:

Check logs: sudo journalctl -u minus --since "60 seconds ago" | grep -E "error|restart|fail"
Check FPS: curl -s http://localhost/api/status | jq .fps
Check ALSA status: cat /proc/asound/card*/pcm*/sub*/status
Check pipeline state: API responses, GStreamer state queries
Record video samples for visual issues: ffmpeg -i http://localhost:9090/stream -t 10 test.mp4

Common Pitfalls to Avoid:

Limiting retry attempts instead of fixing why retries are needed
Assuming a fix works without observing the system under the original conditions
Pushing multiple untested changes at once (makes debugging harder)
Not checking if the "fix" introduced new problems

Git commits:

Do NOT add "Co-Authored-By" lines to commits
Do NOT add "Generated with Claude Code" lines to commits
Keep commit messages clean and professional - just the message, no AI attribution
Do NOT create v2, v3, v4 files - update existing files directly
VLM uses Python axengine for inference (not pexpect/C++ binary)
Both NPUs run in parallel without resource contention
No X11 required - pure DRM/KMS display
Color correction via GStreamer videobalance (not V4L2 controls)
Health monitor runs every 5 seconds in background thread
VLM frame files use PID-based naming to avoid permission conflicts
Snapshots scaled to 960x540 before OCR (model uses 960x960 anyway, smaller = faster)
ustreamer quality set to 80% for balance of quality and CPU load
FPS tracked via GStreamer identity element with pad probe
Startup cleanup removes stale frame files and kills orphaned processes
Background upload is async to prevent blocking main thread
Animation times optimized: 0.3s start, 0.25s end for fast response
DYNAMIC_COOLDOWN reduced to 0.5s for faster ad detection

Building Executable

pip3 install pyinstaller
pyinstaller minus.spec
# Output: dist/minus

Note: Models are external and must be present at runtime.

Testing

The project includes a comprehensive test suite for all extracted modules.

Running Tests:

python3 tests/test_modules.py                  # 300+ unit tests
python3 tests/test_autonomous_mode.py          # Autonomous mode tests
python3 tests/test_recent_features.py          # Recent feature tests
python3 tests/test_block_decision_engine.py    # Blocking state-machine regressions
python3 tests/test_review_ui.py                # Playwright UI tests (requires chromium)
python3 tests/test_ir_transmitter.py           # IR transmitter unit tests (mocked sysfs)
python3 tests/test_ir_ui.py                    # Playwright UI tests for IR remote panel
python3 tests/test_status_led_controller.py    # Status LED state-machine tests (mocked hardware)
python3 tests/test_status_leds_ui.py           # Playwright UI tests for status-LED panel

Block-latency test harness (tests/block_latency_harness.py):

Headless rig for tuning the blocking decision engine. Plays Big Buck Bunny in a Python loop, lets the test orchestrator inject "AD"-style overlay text on/off at controlled timestamps, and measures detect / recover latency end-to-end through the production OCR + VLM workers + a faithful mirror of minus.py's blocking decision logic. No HDMI, no ustreamer, no DRM, no audio.

# Place a video file at /home/radxa/test_assets/bbb.mp4 first.
python3 tests/block_latency_harness.py round1   # 9 detect/recover combos
python3 tests/block_latency_harness.py round4   # realistic ad-break shapes
python3 tests/block_latency_harness.py round5   # VLM state machine (injected verdicts)
python3 tests/block_latency_harness.py round6   # user-bug pause-on-ad regression
python3 tests/block_latency_harness.py round7   # production-shaped, OCR + VLM corroborated

use_real_vlm=False mode uses injected VLM verdicts so the engine's sliding-window state machine can be driven deterministically without the ~30s real-VLM model load. Override PARAMS from a small wrapper script to A/B-test tuning candidates; the in-rig defaults mirror the locked-in production values.

Test Coverage:

Module	Test Class	Tests
`src/vocabulary.py`	TestVocabulary	Format validation, content checks, common words
`src/config.py`	TestConfig	Dataclass defaults, custom values
`src/skip_detection.py`	TestSkipDetection	Pattern matching, countdown parsing, edge cases
`src/screenshots.py`	TestScreenshots	Deduplication, file saving, truncation
`src/console.py`	TestConsole	Console blanking/restore commands
`src/capture.py`	TestCapture	Snapshot capture, cleanup
`src/drm.py`	TestDRM	DRM probing, fallback values
`src/v4l2.py`	TestV4L2	V4L2 format detection, error handling
`src/overlay.py`	TestOverlay	NotificationOverlay, positions, show/hide
`src/health.py`	TestHealth	HealthMonitor, HealthStatus, HDMI detection
`src/fire_tv.py`	TestFireTV	Controller, key codes, device detection
`src/vlm.py`	TestVLM	VLMManager, response parsing, 4-tuple returns
`src/ocr.py`	TestOCR	Keywords, exclusions, terminal detection
`src/webui.py`	TestWebUI, TestWebUIExtended	Flask routes, all API endpoints
`src/ad_blocker.py`	TestAdBlocker, TestAdBlockerExtended	Blocking modes, color controls, animations
`src/audio.py`	TestAudio, TestAudioExtended	A/V sync, pipeline controls, mute/unmute
`src/fire_tv.py`	TestFireTV, TestFireTVExtended	Connection, commands, device discovery
`src/vlm.py`	TestVLM, TestVLMExtended	Response parsing, confidence detection
`src/ocr.py`	TestOCR, TestOCRExtended	Keywords, exclusions, terminal detection
`src/skip_detection.py`	TestSkipDetection, TestSkipDetectionExtended	Pattern matching, countdown parsing
`src/screenshots.py`	TestScreenshots, TestScreenshotsExtended	Deduplication, categories, truncation
`src/config.py`	TestConfig, TestConfigValidation	Defaults, custom values
`src/health.py`	TestHealth, TestHealthExtended	Monitoring, callbacks, status
`src/overlay.py`	TestOverlay, TestOverlayExtended	Positions, show/hide, text formatting
`src/drm.py`	TestDRM, TestDRMExtended	DRM probing, fallback values
`src/v4l2.py`	TestV4L2, TestV4L2Extended	Format detection, error handling
`src/console.py`	TestConsole, TestConsoleExtended	Console blanking/restore
`src/capture.py`	TestCapture, TestCaptureExtended	Snapshot capture, cleanup
Integration	TestIntegration	Cross-module tests
Memory	TestMemoryLeaks	Resource cleanup, executor reuse
Blocking	TestBlockingModeIntegration	State transitions, API format
Error Handling	TestErrorHandling	Missing subsystems, graceful failures
Concurrency	TestConcurrency	Thread safety, locks
Vocabulary	TestVocabulary, TestVocabularyContent	Format, content, duplicates
API Responses	TestAPIResponseFormats	Consistent response structure
`src/vlm.py`	TestVLMQueryImage	Custom prompt queries, error paths
`src/ocr.py`	TestOCRResilience	NPU failure handling, graceful degradation
`src/screenshots.py`	TestScreenshotDedup	dHash, blank rejection, rate limiting, per-category
Memory	TestMemoryManagement	Hash buffer caps, resource cleanup
HDCP	TestHDCPHandling	Encrypted frame handling, blank frame rejection
`src/autonomous_mode.py`	TestAutonomousMode	Schedule, VLM actions, state, persistence (separate file)
Review UI	TestReviewModal*	Playwright: desktop/mobile swipe, modal, API (separate file)

Test Design:

Tests are self-contained with temporary directories
Mock subprocess calls to avoid system dependencies
Fallback to manual test runner if pytest not installed
All 300+ tests should pass on a clean system
Playwright tests require chromium: python3 -m playwright install chromium

Module Structure

The codebase has been refactored from monolithic files into smaller, focused modules:

Extracted from minus.py:

src/console.py - Console blanking functions (blank_console, restore_console)
src/drm.py - DRM probing (probe_drm_output)
src/v4l2.py - V4L2 probing (probe_v4l2_device)
src/config.py - Configuration dataclass (MinusConfig)
src/capture.py - Snapshot capture (UstreamerCapture)
src/screenshots.py - Screenshot management (ScreenshotManager)
src/skip_detection.py - Skip button detection (check_skip_opportunity)

Extracted from ad_blocker.py:

src/vocabulary.py - Spanish vocabulary list (SPANISH_VOCABULARY)

Benefits:

Easier to test individual components
Better code organization and discoverability
Reduced file sizes (minus.py ~1700 lines, ad_blocker.py ~950 lines)
Clear separation of concerns

Known Issues / Fixed

GStreamer Video Path Overlay (Historical - FIXED)

Previous problem: Adding a textoverlay element to the GStreamer video path caused pipeline stalls every ~12 seconds due to NV12 format incompatibility and 4K→1080p resolution mismatch.

Solution implemented: Text overlay is now rendered directly in ustreamer's MPP encoder via the blocking mode API. This:

Composites directly on NV12 frames in the encoder
Has minimal CPU impact (~0.5ms per frame)
Works at any resolution without GStreamer pipeline changes
Supports pixelated background, live preview window, and text overlays
Uses FreeType for proper TrueType font rendering

Memory Management (Fixed)

Issue: Long-running sessions (several hours) could accumulate memory due to RKNN inference output buffers not being explicitly released.

Solution implemented:

RKNN inference outputs are now explicitly copied and dereferenced in src/ocr.py
Periodic gc.collect() runs every 100 OCR frames and every 50 VLM frames
Health monitor triggers emergency cleanup at 90% memory usage
Frame buffers (prev_frame, vlm_prev_frame) are cleared during memory critical events

ThreadPoolExecutor fix (Jan 2026):

CRITICAL: The OCR worker was creating a new ThreadPoolExecutor on every iteration, causing massive file descriptor and memory leaks (~12GB after 12 hours)
Fixed by creating a single ocr_executor before the loop and reusing it
Symptom: "Too many open files" errors, display goes blank, memory exhaustion

Memory monitoring:

Health monitor checks memory every 5 seconds
Warning logged at 80% usage
Critical cleanup triggered at 90% usage

Fire TV Setup (Fixed)

Status: Fire TV auto-setup is ENABLED with notification overlays working via ustreamer API.

Startup timing:

Fire TV setup starts 5 seconds after service start (runs in parallel with VLM loading)
Total time from start to connection: ~13 seconds (5s delay + ~8s scan/connect)

Bug fixed: Auth retry interval was 3 seconds, causing multiple auth dialogs on the TV before user could respond. Fixed to 35 seconds (longer than AUTH_TIMEOUT of 30s) in fire_tv_setup.py.

Audio Watchdog Restart Loop (Fixed - Apr 2026)

Symptom: Frame jumps every 2-3 seconds due to constant GStreamer audio pipeline restarts.

Root Cause: When HDMI signal was restored, resume_watchdog() tried to create a new audio pipeline without:

Checking if the existing pipeline was already working
Cleaning up the old pipeline first

This caused the new pipeline to fail with "device in use" because the old pipeline still held the ALSA device. The watchdog then repeatedly tried to restart every 3 seconds.

Why band-aid fixes don't work: Initially tried limiting restart attempts, but this just disabled audio after 5 restarts instead of fixing the underlying issue. The correct approach was to find WHY restarts were happening.

Solution implemented:

Added _is_alsa_device_running() helper that checks /proc/asound/cardX/pcmYp/sub0/status to verify if ALSA device is actually running with our PID
This is more reliable than GStreamer state queries when PipeWire/WirePlumber is involved
Modified watchdog loop to skip restarts when ALSA confirms audio is flowing
Modified resume_watchdog() to check if pipeline is already PLAYING before restart
Added proper cleanup of old pipeline before creating new one

Key insight: The /proc/asound status showed the device was RUNNING with minus as owner, proving audio WAS working. GStreamer state queries were unreliable due to PipeWire interference, but the kernel-level ALSA status was authoritative.

MPP Decoder Stuck After HDMI Signal Drop (Fixed - Apr 2026)

Symptom: After a brief HDMI signal loss (even 8 seconds), the video pipeline stalls every ~12 seconds with mpp_buffer: check buffer found NULL pointer from mpp_dec_advanced_thread. Restarting the GStreamer pipeline alone doesn't help - MPP stays stuck.

Root Cause: The RK3588 MPP JPEG decoder holds resources that don't get properly freed when the GStreamer pipeline is destroyed. After the HDMI source briefly drops and recovers, the decoder enters a corrupt state that persists across pipeline restarts.

Solution implemented:

After 3+ consecutive pipeline failures, the system now kills ustreamer (pkill -9 ustreamer) to force-release MPP resources
The health monitor detects ustreamer is down and restarts it + the video pipeline with clean MPP state
This auto-recovers from stuck MPP decoder without manual service restart

Audio Device Mismatch on Display Reconnect (Fixed - Apr 2026)

Symptom: No audio after TV wakes up from standby. Audio pipeline starts on wrong HDMI output (e.g., hw:0,0 instead of hw:1,0).

Root Cause: When the display retry loop detects a DRM output change (TV connected to different HDMI port than at boot), it updated the config but not the audio object's playback device. Audio would start on the old device.

Solution implemented:

Display retry loop now checks if drm_info['audio_device'] != self.audio.playback_device
If changed, stops the audio pipeline and updates the playback device before restarting
Ensures audio always matches the active HDMI output

Netflix Ad Countdown Detection (Fixed - Apr 2026)

Symptom: Netflix ads showing "Ad 10", "Ad 5" (countdown timer format) were not detected by OCR.

Root Cause: Existing OCR patterns only matched "Ad X of Y" format. Netflix uses standalone "Ad NN" where NN is seconds remaining.

Solution: Added regex pattern ^ad\s*\d+$ to match the countdown format.

Skip-to-Unblock Delay (Fixed - Apr 2026)

Symptom: After successfully skipping an ad, the blocking overlay stayed for 2-3+ seconds waiting for OCR to detect the ad was gone.

Solution: After a successful skip command (auto or manual via web UI), blocking is now removed after a 1.5s delay instead of waiting for 3 OCR cycles. The delay allows the skip animation to complete, then force-unblocks by resetting all detection state. Skip command is device-agnostic — routes to Fire TV (skip_ad()), Roku (send_command('select')), or Google TV based on the configured device type.

GStreamer Bus Signal Watch FD Leak (Fixed - Apr 2026)

Symptom: After running for 12+ hours with no HDMI signal, the web server becomes unresponsive. Logs show [Errno 24] Too many open files errors. The service cannot open new files or sockets.

Root Cause: When the no-signal or loading GStreamer pipelines failed to start, the cleanup code did not remove the bus signal watch before destroying the pipeline. Each failed attempt leaked a file descriptor from bus.add_signal_watch(). With retries every 10 seconds, the 1024 FD limit was reached in ~3 hours.

Solution: Added proper bus cleanup in all pipeline failure paths:

# Before destroying failed pipeline:
if self.bus:
    self.bus.remove_signal_watch()
    self.bus = None

Fixed in src/ad_blocker.py: start_no_signal_mode() and start_loading_mode() failure paths and exception handlers.

Audio Pipeline Zombie State After Sleep/Wake (Fixed - Apr 2026)

Symptom: After TV/display sleeps for several hours and wakes up, there is no audio output even though the health monitor reports audio=OK and the ALSA device shows state: RUNNING.

Root Cause: The GStreamer audio pipeline runs in a separate thread. When the display sleeps, this thread can crash or die (e.g., due to ALSA device disconnection), but:

The Python AudioPassthrough object retains a stale reference to the dead pipeline
The ALSA device shows owner_pid pointing to the dead thread's PID
The health check only queries the Python GStreamer state, not the actual ALSA device ownership
Result: Health reports audio=OK while no actual audio is flowing

Detection: Check if the ALSA playback device's owner_pid corresponds to a live process:

# Get owner PID
cat /proc/asound/card1/pcm0p/sub0/status | grep owner_pid
# owner_pid   : 179247

# Check if process exists
ps -p 179247
# Returns empty = zombie audio state!

Solution: Enhanced _check_audio_pipeline() in src/health.py to:

Read the ALSA device status from /proc/asound/cardX/pcm0p/sub0/status
Verify the owner_pid corresponds to a live process (check /proc/{pid}/ exists)
If owner is dead but device shows RUNNING, trigger full _restart_pipeline() (not just queue flush)
10-second cooldown after any restart before zombie detection runs again (prevents restart loops)
Skip zombie detection if restart is already in progress
This runs every health check cycle (5 seconds), so recovery happens automatically

Files modified:

src/health.py - Added _check_alsa_zombie_state() method with full restart and cooldown logic

OCR Ad Timestamp Pattern Fix (Fixed - Apr 2026)

Symptom: Ad blocking would flicker on/off during ads because OCR sometimes reads "Ad 0:42" (with space) and sometimes "Ad0:42" (no space) or "Ado:55" (OCR misreads '0' as 'o').

Root Cause: The OCR pattern used word boundaries (\bad\b) which required a space between "Ad" and the timestamp. When OCR dropped the space, the pattern didn't match, counting as "no ad". After 3 "no ads", blocking ended, then immediately re-triggered when a frame with space was detected.

Solution: Updated src/ocr.py to match OCR variants:

ad[0o]: pattern catches "Ad0:" and "Ado:" (no space, or 'o' misread)
[0-9o]:\d{2} timestamp pattern handles 'o' misread as '0'
Both per-element and cross-element checks updated

Test cases now matched:

Ad 0:42 - standard format ✓
Ad0:42 - no space ✓
Ado:55 - OCR misread '0' as 'o' ✓
0:30 | Ad - Hulu style ✓

HDMI PHY Not Reinitializing After TV Restart (Fixed - Apr 2026)

Symptom: After TV restart/power cycle, the GStreamer pipeline reports "No-signal display started successfully" but the TV shows its own "HDMI 1 No Signal" message (meaning no video signal from RK3588).

Root Cause: When the TV restarts, the HDMI hotplug event is detected and the sysfs status changes from "disconnected" to "connected", but the HDMI PHY (physical layer) doesn't properly reinitialize. The DRM connector shows as connected, but no actual video signal is being transmitted.

Discovery: Physically unplugging and replugging the HDMI cable made the display work, indicating the HDMI PHY needed reinitialization that wasn't happening on TV restart.

Solution: Force HDMI PHY reinitialization via DPMS (Display Power Management Signaling) cycle:

When TV reconnects, health monitor detects status change and waits 2s for link stabilization
DPMS Off (value 3) sent via modetest -M rockchip -w {connector}:DPMS:3
Wait 300ms
DPMS On (value 0) sent via modetest -M rockchip -w {connector}:DPMS:0
This forces the HDMI transmitter to reinitialize, equivalent to cable replug

Implementation:

src/health.py - Health monitor detects TV reconnection and calls ad_blocker.restart(hdmi_reconnect=True)
src/ad_blocker.py - _restart_pipeline(hdmi_reconnect=True) does:
1. Stop existing pipeline
2. DPMS cycle via _force_hdmi_reinit()
3. Re-probe DRM to detect connector/plane changes
4. Restart audio pipeline (required after TV power cycle)
5. Start new video pipeline
For no-signal mode, DPMS cycle is done in start_no_signal_mode() directly

Key heuristics for detecting working vs broken state:

Heuristic	Working	Broken (needs DPMS)
sysfs status	connected	connected
sysfs dpms	On	On
Video output	Visible	TV shows "No Signal"

Note: All sysfs values look identical in both states - the only difference is whether video is actually being transmitted. The DPMS cycle is applied preemptively on every TV reconnection.

ALSA Zombie Detection False Positives (Fixed - Apr 2026)

Symptom: Audio cuts out every ~15 seconds with logs showing "Audio zombie state detected - GStreamer playing but ALSA owner dead" followed by constant pipeline restarts.

Root Cause: The ALSA owner_pid in /proc/asound/cardX/pcm0p/sub0/status is actually a thread ID (TID), not a process ID (PID). The zombie detection code was checking /proc/{owner_pid} which doesn't exist for threads - threads are listed under /proc/{main_pid}/task/{tid} instead.

Solution: Updated _check_alsa_zombie_state() in src/health.py to check both locations:

First check /proc/{owner_pid} (works if it's a PID)
If not found, check /proc/{main_pid}/task/{owner_pid} (works if it's a TID)

This prevents false zombie detection when the audio thread is actually alive and healthy.

Minus Overlay Text Triggering False Positive Ad Detection (Fixed - Apr 2026)

Symptom: Screen stuck on "Initializing..." for 20+ minutes. GStreamer pipeline in restart loop (37+ attempts). ustreamer is capturing video correctly but display pipeline fails.

Root Cause: The Fire TV notification overlay shows "Ad skipping enabled." which contains the word "ad". When OCR read this overlay text, it triggered false positive ad detection. This activated the blocking mode, which caused MPP pipeline errors (mpp_buffer: check buffer found NULL pointer).

Why overlay is visible to OCR: The notification overlay is composited at the ustreamer encoder level BEFORE the snapshot, so /snapshot/raw includes overlay text. This is by design for the preview window in blocking mode, but it means OCR sees everything on screen including our overlays.

Solution: Added our overlay messages to the OCR exclusion lists:

'ad skipping enabled', 'ad skipping', 'adskipping' added to AD_EXCLUSIONS in both src/ocr.py and src/ocr_worker.py

Files modified:

src/ocr.py - Added Minus overlay exclusions
src/ocr_worker.py - Added Minus overlay exclusions

Autonomous Mode False-Pause When Display Disconnected (Fixed - Apr 2026)

Symptom: Running autonomous mode with HDMI-TX disconnected, music videos with static album art were being paused by autonomous mode every 20 seconds, interrupting legitimate playback.

Root Cause: With display disconnected, the audio pipeline's alsasink can't open HDMI-TX, so the pipeline never receives buffers (last_buffer_age == -1). _is_audio_flowing() returned False. On music videos with static art (hamming≈0), the pause detector concluded "static frames + no audio = PAUSED" and sent play_pause, actually pausing content that was playing.

Solution: Added _is_audio_pipeline_available() in src/autonomous_mode.py. When the audio pipeline has never received a buffer or its state is stopped, treat audio as "unknown" rather than "not flowing". _is_screen_static() returns False in that case so autonomous mode does not assume paused. VLM's direct PAUSED verdict still triggers play.

Autonomous Mode Navigation During Ads (Fixed - Apr 2026)

Symptom: During real ads on YouTube, autonomous mode would fire down + select commands thinking it was on the home screen, navigating through the ad UI and occasionally switching to a different video.

Root Cause: HOME_SCREEN_KEYWORDS in src/autonomous_mode.py contained 'sponsored' and 'views'. "Sponsored · Visit advertiser" on YouTube pre-roll ads and "347M views" in any video's info panel both matched, triggering the home-screen action path.

Solution:

Removed 'sponsored' and 'views' from HOME_SCREEN_KEYWORDS.
Added AD_ONLY_KEYWORDS ('visit advertiser', 'send to phone', 'skip in', 'skip ad') — if any are present, skip home-screen detection.
Added ad_blocker.is_visible guard — if blocking is active, never classify as home screen.
In minus.py, added a secondary audio-aware guard: if the OCR match is only 'sponsored' and HDMI-IN audio_present=0, suppress the block. Real video ads transmit audio; home-screen sponsored tiles usually don't. Uses new _hdmi_audio_present() helper reading v4l2-ctl directly so it works even when our playback pipeline is down.

OCR 'ad in' Keyword Matching Inside Words (Fixed - Apr 2026)

Symptom: False ad blocks triggered when OCR read "LOADING" or "reading" on screen.

Root Cause: AD_KEYWORDS_EXACT in src/ocr_worker.py contained 'ad in'. The alphanumeric-normalized form is 'adin' (4 chars), which appears as a substring in 'loading' (loading), 'reading' (reading), and similar words.

Solution: Removed 'ad in' from exact keywords. The specific patterns for "Ad N of M", "Ad N" countdown, and "ad with timestamp" (in both ocr.py and ocr_worker.py) already cover legitimate ad timestamps.

VLM Degraded State Auto-Recovery (Added - Apr 2026, ROOT CAUSE CORRECTED)

Symptom: After several hours of runtime, VLM inference degrades from ~0.7s to ~15–18s per query and returns descriptive responses to short-answer prompts. Each DISCARDED (>2s) entry makes the system effectively OCR-only. Not thermal — temperatures stayed around 70°C both when healthy and when slow.

Original solution (kept as defense-in-depth): Rolling latency window + auto-recovery in src/vlm_worker.py:

_record_latency() / _maybe_auto_recover() called after each successful inference.
If P95 over the last 10 queries exceeds 3.0s, trigger a worker restart.
If a prior recovery happened within the last 3 minutes and we're degraded again, escalate to a deep restart with 8s NPU-release backoff.
60s cooldown prevents thrash.
get_latency_stats() exposes samples/P50/P95/max via /api/health under subsystems.vlm.latency.

Axera telemetry (axcl-smi info --temp / --npu / --cmm) is wired into /api/health at subsystems.vlm.axera and exposed as Prometheus gauges minus_axera_* for alerting on temperature or memory pressure.

⚠️ Correction (Apr 2026): The "NPU drift to degraded state" framing turned out to be wrong. Controlled experiments (docs/VLM_NPU_DEGRADATION.md) confirmed:

Latency is deterministically image-dependent, not a state that drifts in over time.
Per-token decode rate is constant (~0.23 s/tok); the slow inferences are slow because the model generates 30–60 tokens of descriptive response instead of 1–3 tokens of Yes./No..
The NPU, axcl driver, and Axera firmware are all healthy throughout. axcl-smi reboot and rmmod + modprobe of the host modules do not change behavior on the same image.

Real fix: Cap max_new_tokens at the model layer (5 for detect_ad, 8 for query_image). With the cap, worst-case latency drops from ~12 s to ~1.3 s and the entire restart-cycle pathology goes away. The auto-recovery logic above stays in as defense-in-depth for any genuine NPU pathology, but in normal operation it should never fire.

VLMProcess Cross-Thread Race (Fixed - Apr 2026)

Symptom: Intermittent too many values to unpack (expected 2) from [AutonomousMode] VLM screen query failed, plus a sustained worker restart cycle (~15–40 hard kills per 15 min) that the soft/hard timeout logic could not damp on its own.

Root Cause: VLMProcess.detect_ad (called from the detection-loop thread) and VLMProcess.query_image (called from the autonomous-mode thread) shared the same request/response multiprocessing.Queue with no request-to-response correlation and no lock around the queue or the shared state (_consecutive_timeouts, _pending_response, _recent_latencies). When both threads called concurrently:

A detect_ad 4-tuple response could be get()-ed by the query_image caller (which expected a 2-tuple) — and vice versa — producing the unpack error.
Concurrent mutation of _consecutive_timeouts and _pending_response produced spurious threshold trips, triggering hard kills the system did not actually need. Each hard kill cost ~25s of model reload, during which more queued requests timed out, perpetuating the cycle.

Solution:

Added self._call_lock = threading.Lock() to VLMProcess.__init__.
Refactored detect_ad and query_image into thin wrappers that acquire the lock, then delegate to _detect_ad_locked / _query_image_locked with the original logic.
This serializes the two callers across the entire request → response cycle, so cross-pollinated responses cannot happen and the shared timeout state stays consistent.

Upstream's tuple-shape defensive guards (introduced in commit 7c42e80) are kept as belt-and-suspenders — they tolerate a stale leaked response if one ever does slip through. The lock prevents the leak; the guards handle it if prevention fails.

Why a lock and not separate queues / request IDs: simplest correct fix that is local to VLMProcess. Detection-loop calls are ~4 Hz and complete in ~0.7s; autonomous-mode calls are once per 2 minutes and complete in ~1.0s. The lock contention is negligible in practice. A dedicated request-ID protocol would be cleaner but invasive to both worker and callers.

Files modified:

src/vlm_worker.py — _call_lock, _detect_ad_locked, _query_image_locked

A/V Sync Flush Disabled (Apr 2026)

Symptom: Every 45 minutes of uptime, the AudioPassthrough watchdog ran its periodic sync-queue flush; Sync queue flushed was always followed ~12s later by Pipeline issue detected: not in PLAYING state (paused) and a full Restarting pipeline (attempt N). Cumulative effect: ~32 spurious audio restarts per day, each a brief dropout. The feature that was supposed to prevent restarts was causing them.

Root cause (investigated in 4 failed fix iterations): the flush itself is unrecoverable without a full pipeline rebuild on this pipeline configuration.

flush-start event puts the sync queue and downstream into flushing mode.
flush-stop should resume streaming, but syncqueue has min-threshold-time=300ms that blocks downstream reads until the queue has refilled past the threshold.
While the queue is blocking, alsasink — having no data to consume — closes its PCM device. ALSA state transitions out of RUNNING, hw_ptr goes to 0.
set_state(PLAYING) on the pipeline cannot bring alsasink back up because the upstream queue is still blocked. The pipeline gets stuck in PAUSED for 10+ seconds until the watchdog gives up and restarts the whole pipeline.

Attempts that did not work:

Same-iteration continue after flush (commit 3b4e0d0) — subsequent iterations still trip on the lingering PAUSED.
10-second post-flush "grace window" — flush recovery takes longer than that.
Explicit pipeline.set_state(PLAYING) with bounded 2s wait — get_state returns PAUSED regardless.
Temporarily zeroing syncqueue.min-threshold-time across the flush + 400ms refill sleep — alsasink had already dropped the PCM by then.

Solution: flip self._sync_reset_enabled = False in AudioPassthrough.__init__ (see src/audio.py:151). The periodic flush never runs, so it can never cascade. Drift isn't a real concern in this pipeline (provide-clock=false on alsasrc, sync=false on alsasink) and 48+ hours of runtime without a working flush showed no observable A/V desync.

To find the commit that made this change: git log --all --oneline --grep='disable periodic A/V sync flush' (commit subject is stable across amends).

Kept as a side-benefit of the investigation: rewrote _is_alsa_device_running() to sample hw_ptr across a 50ms window instead of comparing ALSA's owner_pid to the main process PID. The old check compared an ALSA-reported thread TID (often a stale one) against the main PID, so it could never return True under normal operation. The watchdog's "GStreamer reports PAUSED but ALSA is flowing — skip restart" rescue path has always been broken; now it works.

If drift becomes a real problem in the future (easy revert):

Write a flush mechanism that does not let alsasink close its PCM device — either by pausing→flushing→playing the whole pipeline in one atomic block, or by replacing syncqueue with an element that doesn't block on min-threshold-time.
Only after (1) works, flip _sync_reset_enabled back to True in src/audio.py.
Re-run the soak test (_sync_interval = 2.5 * 60 + 5-min monitor for ~45 min) and confirm audio.restart_count stays at 0.

Do NOT simply flip _sync_reset_enabled back to True without (1). The bug will return.

Files modified:

src/audio.py — _sync_reset_enabled = False + explanatory block comment; _is_alsa_device_running() rewrite

VLM Queue Desync (Off-By-One on Soft Timeout) (Fixed - Apr 2026)

Symptom: After pausing on an ad on Netflix and unpausing, the blocking overlay stayed for ~20 seconds applied against frames where the show was clearly playing again. Other variants: VLM verdicts persistently lagging actual screen content by one frame; rare reports of "Ad 1:30 left" claims long after a real ad had ended.

Root cause: VLMProcess._detect_ad_locked and _query_image_locked (src/vlm_worker.py) shared the same MP request/response queues with no per-request correlation. When VLM hit a soft timeout (1.5s, ~15% of inferences in normal load), the request stayed in flight and _pending_response was set to True. On the next call:

The drain attempt was a single get(timeout=0.1). If the worker had not yet pushed its response (still mid-inference), drain timed out.
The code then fell through and put-ed a NEW request anyway.
Now two requests were in flight. Worker finished the first → pushed result A → caller's get(SOFT_TIMEOUT) received result A as the answer for request B.
The queue was now permanently off-by-one. Every subsequent get() returned the prior frame's verdict.
After a pause-on-ad (where the queue accumulated several "ad" verdicts during the pause), the entire backlog was delivered against post-unpause "show is playing" frames → 10–20 seconds of phantom blocking.

The shared /dev/shm/minus_vlm_frame_<pid>.jpg path made it worse — the file was always the most recently written frame, so even the worker's view of "what was frame N" could be stale.

Solution:

Drain ALL stale responses at function entry using a get_nowait() loop (was a single get(timeout=0.1)).
If a request is genuinely still in flight after draining, do NOT queue another. Return "PENDING" (or "KILLED" after RESTART_THRESHOLD consecutive pendings). Caller treats this exactly like the existing "TIMEOUT" skip path — is_ad=False, confidence=0.0, no-op on the sliding window.

This guarantees only one request is ever in flight, which incidentally also fixes the file-content race because the worker dequeues and reads the file in tight succession.

Files modified:

src/vlm_worker.py — _detect_ad_locked and _query_image_locked rewritten with multi-drain + don't-double-queue

HDMI Restored Recovery Leaves Display Dead Forever (Fixed - Apr 2026)

Symptom: TV stays frozen on a stale frame for hours. Web app shows live content. subsystems.video.status reads error/reason: no_pipeline. fps_capture is healthy (~42 fps) but fps_display is essentially 0. Service uptime can be many hours; restart is the only recovery.

Root cause (two coupled defects in Minus._on_hdmi_restored() at minus.py:634):

ad_blocker.start()'s return value was ignored. When HDMI input recovers but HDMI-OUT (the TV) is still disconnected, kmssink can't open the DRM plane and start() returns False. The recovery handler proceeded as if all was well and logged [Recovery] HDMI recovery complete.
self.display_connected was left stuck at True. The display retry loop (_start_display_retry_loop) is the only thing that can recreate a dead pipeline post-startup, but it gates on not self.display_connected. Since recovery never set the flag to False on failure, the retry loop never ran. Pipeline stayed dead until the next service restart.

Observed once today across a 17-hour run: at 08:19:59 HDMI input recovered after a 550-second loss while the TV was off. Recovery declared success. Attempting to reconnect display pipeline log line count for the entire 17-hour run: 0. The display sat frozen on its last decoded frame all the way until a manual restart at 12:46.

Solution: check start()'s return value; on failure, set display_connected=False, populate display_error, and call _start_display_retry_loop(). The retry loop already exists and works correctly — it just needs to be armed. Audio remains paused/muted on the failure path; it'll be resumed by the normal start path inside the retry loop when the pipeline finally comes up.

Why the failure mode is sticky without this fix: there is no other code path that ever flips display_connected from True back to False post-startup. Initial startup (minus.py:3000) is the only place. The retry loop never fires because its gate (not self.display_connected) stays False.

The NO SIGNAL behavior is unchanged: the HDMI-LOST path still calls start_no_signal_mode() and the health monitor's "Continuous NO SIGNAL mode enforcement" loop still re-triggers it whenever HDMI input is absent. So the desired "TV shows NO SIGNAL when input is gone" behavior is preserved end-to-end.

Files modified:

minus.py — _on_hdmi_restored(): capture return value, branch on success/failure, arm retry loop on failure

Phantom Re-Block After Pause-On-Ad (Fixed - Apr 2026)

Symptom: User pauses on a real ad on Netflix, ad ends offscreen during the pause, user unpauses to actual show content — and Minus shows the blocking overlay for ~5 more seconds on the show content. Reproduced via the block-latency harness (round6): with the OLD parameters, 3/3 scenarios observed phantom re-blocks at ~0.9s after unpause.

Root cause: three coupled defects in the static-suppression / cooldown machinery, each individually plausible but combining badly:

OCR_STOP_THRESHOLD = 4 meant blocking took 4 OCR cycles × 0.5s = 2s to clear once the ad ended — already over the 1.5s responsiveness target the user wanted.
scene_change_threshold = 0.01 misclassified ~26% of natural low-motion frames in real video content as "static" (measured against BBB's actual inter-frame mean-abs-diff distribution: p5=0.002, p50=0.017, max=0.31). Static suppression therefore flapped on/off mid-content during slow scenes.
dynamic_cooldown = 0.5s was too short for the post-pause AD overlay to actually transition off-screen. The cooldown completed → state was cleared → the very next OCR cycle re-detected the still-lingering AD text → blocking re-fired immediately.

The user only saw symptom 3 in the worst form (the phantom re-block), but symptoms 1 and 2 amplified its visibility — symptom 2 was also responsible for the related "blocking flips off mid-content" issue earlier in the same investigation.

Solution: three coordinated tuning changes, locked in via tests/block_latency_harness.py measurements (rounds 1, 4, 6, 7):

Parameter	Old	New	Effect
`OCR_STOP_THRESHOLD` (`minus.py`)	4	2	recover 2.0s → 1.0s
`scene_change_threshold` (`config.py`)	0.01	0.001	only truly-frozen frames (~1.7% of BBB) register as static; natural low-motion content (~98%) keeps flowing
`dynamic_cooldown` (`config.py`)	0.5s	1.5s	post-pause AD overlay actually finishes transitioning off-screen before state is cleared

Verification: round6 of the harness re-runs the user's scenario 3× per parameter set:

OLD params: 3/3 phantom re-blocks, max 0.90s after unpause
NEW params: 0/3 phantom re-blocks ✓

Final scenario performance with locked-in params:

detect: mean 0.59s, max 0.66s, 9/9 clean across all round-1 ad shapes
recover: mean 0.97s, max 1.15s, all under 1.5s goal
0 false-positive blocking events across 15s of clean content (round 7)
0 mid-block flaps across a 30s sustained ad (round 7)

Defense-in-depth: tests/test_block_decision_engine.py adds 11 lightweight unit tests for the DecisionEngine state machine (cooldown clearing, OCR stop threshold, VLM-only fast-stop, the user-bug regression itself with both OLD and NEW params asserted). Runs as part of the standard test suite.

Files modified:

minus.py — OCR_STOP_THRESHOLD = 2 + comment with link to harness
src/config.py — scene_change_threshold = 0.001 + measurement-derived comment, dynamic_cooldown = 1.5 (already changed in the cooldown-fix commit earlier this session)
tests/block_latency_harness.py — new ~700-line headless harness (BBB source, OCR/VLM workers, decision-engine mirror, 7 rounds of scenarios)
tests/test_block_decision_engine.py — new 11 unit tests

OCR Worker Keyword-Pattern Drift (Fixed - May 2026)

Symptom: During real ad breaks, blocking flapped on/off every 5–15 seconds even though OCR was reading the ad timer cleanly every frame. Logs showed sequences like:

00:29:00 [BLOCKING OCR] - Ad 1:11        ← match (boundary)
00:29:01 [BLOCKING OCR] - RATED TV-MA    ← no_ad #1
00:29:03 OCR #62188            - Ad1:09   ← no_ad #2 — silently!
00:29:03 OCR: ad no longer detected (after 2 no-ads)
00:29:03 AD BLOCKING ENDED after 3.1s
00:29:08 - Ad 1:02 → AD BLOCKING STARTED again

OCR's text output was literally the running ad timer, but check_ad_keywords was returning ad_detected=False, so the no-ad counter incremented and tripped OCR_STOP_THRESHOLD=2.

Root cause: there are two check_ad_keywords implementations — src/ocr.py:515 on the PaddleOCR class, and src/ocr_worker.py:310 on OCRProcess. Production wires self.ocr = OCRProcess() in minus.py:563, so OCRProcess.check_ad_keywords is what actually runs. The two have drifted: ocr.py was updated months ago to handle the OCR-drops-the-space variant ("Ad1:09") and looser separator/digit misreads, but ocr_worker.py was never updated.

The drifted worker pattern was:

# src/ocr_worker.py (pre-fix) — ONLY matches when there's a word boundary after "ad"
if re.search(r'\bad\b', text_lower) and re.search(r'[0-9o]:[0-9o]{2}', text_lower):
    matched.append(('ad with timestamp', text))

\bad\b requires a non-word char after d. "Ad1:09" puts a digit (word char) right after d, so the boundary doesn't exist and the pattern fails. The timestamp side was also stricter: [0-9o] only (no l/I/i), and : only (no ;/.).

So every frame OCR'd as Ad1:09 was silently a no-ad, and a streaming service that briefly replaces the timer with a rating card ("RATED TV-MA") at ad-to-ad transitions was enough to chain two consecutive no-ads and trip the unblock — even though the same ad break was still running.

Fix: src/ocr_worker.py:404 and the cross-element check at src/ocr_worker.py:418 now mirror src/ocr.py:595 exactly:

has_ad = (re.search(r'\bad\b', text_lower)
          or re.search(r'ad[0-9oOlIi][:;.]', text_lower))
has_timestamp = re.search(r'[0-9oOlIi][:;.][0-9oOlIi][0-9oOlIi]', text_lower)
if has_ad and has_timestamp:
    matched.append(('ad with timestamp', text))

Verified against actual log samples: Ad1:09, Ad 1:11, Ad0:30, Ado:30, Adl:l0, Ad1:02, Ad0:55 all match; bare Ad, RATED TV-MA, loading, reading correctly do not.

Both files now carry a Mirrors src/ocr.py:NNN — keep in sync comment to make the next drift visible at the patch site.

Why this isn't just a tighter mirror: the duplication exists at all because OCRProcess runs check_ad_keywords locally in the main process (it's just string matching, no NPU work) instead of in the worker subprocess where PaddleOCR lives. Deleting the duplicate would require either (a) importing PaddleOCR from ocr.py into ocr_worker.py and calling its method, or (b) sending ocr_results back into the worker for keyword check. (a) is the right fix and a small refactor — open task for next session. Until then, the mirror comments are the guardrail.

Files modified:

src/ocr_worker.py — per-element + cross-element keyword patterns updated; mirror comments added
CLAUDE.md — OCR Timestamp Pattern Handling section now calls out the dual-source requirement

Unified Debug Toggle + Top-Right OCR Snippet (Added - Apr 2026)

The blocking overlay grew a third debug element: a top-right (Ad) 0:30 left snippet showing the OCR text that triggered the block, with the matched keyword wrapped in parens. The existing Debug Dashboard settings toggle was unified into a single Debug toggle that gates three things together: the [ BLOCKING // ... ] header (top), the bottom-left stats dashboard, and this new top-right OCR snippet.

Persistence: the toggle is a system setting (debug_overlay, default True) in ~/.minus_system_settings.json. Pushed into ad_blocker.set_debug_overlay_enabled() at startup so off survives a service restart.

Recursion concern (resolved by existing architecture): the natural worry is that putting the OCR trigger text back on screen would make OCR keep seeing "Ad" forever. That cannot happen because OCR consumes /snapshot/raw (src/capture.py:134), which the patched ustreamer serves from us_blocking_store_raw_frame() before the blocking composite runs (ustreamer-garagehq/src/ustreamer/http/server.c:1026). The new top-right text — and every other element on the blocking overlay — is therefore invisible to OCR. Do not change OCR to read /snapshot (the composited path) without first stripping the debug texts; otherwise the displayed snippet becomes self-triggering. The Minus Overlay Text Triggering False Positive Ad Detection fix in this same Known Issues list is the cautionary tale — the notification overlay (/overlay, distinct from /blocking) DOES composite before the snapshot and required keyword exclusions to suppress recursion.

ustreamer C-side change: added a third text region. Files in ustreamer-garagehq:

src/libs/blocking.h — text_ocr field on us_blocking_config_s, US_BLOCKING_TEXT_OCR_SIZE = 256, declaration of us_blocking_set_text_ocr()
src/libs/blocking.c — setter, clear/snapshot/composite all extended; render block draws at text_x = dst_width - text_w - 30, text_y = 30 using the same IBM Plex Mono Regular face as text_stats. Reuses the existing _ft_mutex since FreeType is not thread-safe across the 4 MPP workers.
src/ustreamer/http/server.c — text_ocr URL param parsing in _http_callback_blocking_set. text_stats_scale is reused for the OCR text size (no separate scale param needed).

Python wiring:

src/ad_blocker.py — _ocr_trigger_text instance, _format_ocr_trigger(raw, source) builds the (Ad) 0:30 left snippet (paren-wraps the matched keyword inside the OCR text snippet, ASCII-collapsed, ≤50 chars), _render_ocr_text() returns empty when debug is off so the C side draws nothing. show(source, ocr_trigger_text="") accepts the trigger payload from minus; only overwrites the stored snippet when given a non-empty value (or when transitioning to vlm-only) so the top-right does not flicker as OCR text comes and goes during a block. set_debug_overlay_enabled() re-renders text_vocab (to add/strip the header) and pushes text_ocr in the right direction without waiting for the next rotation.
minus.py — stashes last_matched_keywords in the OCR loop, helper _first_match_for_overlay() returns (keyword, snippet_text) for the most recent match. _load_system_settings adds debug_overlay: True default + set_debug_overlay_enabled(enabled) persists and propagates. Cleared in the block-end branch so the next block starts fresh.
src/webui.py — /api/debug-overlay/{enable,disable} route through minus.set_debug_overlay_enabled() for persistence. The POST /api/test/trigger-block endpoint injects a synthetic ("Ad", "Ad 0:30 left") snippet when source is ocr/both so the top-right slot can be exercised without real ads.
src/templates/index.html — toggle relabeled "Debug Dashboard" → "Debug" with a tooltip listing what it controls.

Files modified:

ustreamer-garagehq/src/libs/blocking.{h,c}, src/ustreamer/http/server.c — new text_ocr API + top-right render
minus.py, src/ad_blocker.py, src/webui.py, src/templates/index.html

FilesExpand file tree

CLAUDE.md

Latest commit

History

CLAUDE.md

File metadata and controls

Minus - Development Notes

Overview

Documentation

Visual Design

Architecture

Key Files

Running

Performance

ustreamer-patched (NV12 + MPP Hardware Encoding)

Audio Passthrough

Ad Detection Logic (Weighted Model)

Blocking Overlay

Spanish Vocabulary

Housekeeping

VLM Model

Latency-based auto-recovery

Dependencies

Troubleshooting

CRITICAL: Blocking Mode Architecture

ustreamer Overlay and Blocking API

Overlay Priority System

Health Monitoring

Web UI

VLM Training Data Collection

Autonomous Mode

WiFi Captive Portal

IR Transmitter (REI 8K HDMI Switch)

IR Receiver (Bench-Tested, Not Wired Into App)

Status LED Strip (WS2812B on SPI0 MOSI)

Streaming Device Configuration

Fire TV Remote Control

Google TV / Android TV Remote Control

Color Correction

Running as a Service

Development Notes

CRITICAL: Testing and Debugging Methodology

Building Executable

Testing

Module Structure

Known Issues / Fixed

GStreamer Video Path Overlay (Historical - FIXED)

Memory Management (Fixed)

Fire TV Setup (Fixed)

Audio Watchdog Restart Loop (Fixed - Apr 2026)

MPP Decoder Stuck After HDMI Signal Drop (Fixed - Apr 2026)

Audio Device Mismatch on Display Reconnect (Fixed - Apr 2026)

Netflix Ad Countdown Detection (Fixed - Apr 2026)

Skip-to-Unblock Delay (Fixed - Apr 2026)

GStreamer Bus Signal Watch FD Leak (Fixed - Apr 2026)

Audio Pipeline Zombie State After Sleep/Wake (Fixed - Apr 2026)

OCR Ad Timestamp Pattern Fix (Fixed - Apr 2026)

HDMI PHY Not Reinitializing After TV Restart (Fixed - Apr 2026)

ALSA Zombie Detection False Positives (Fixed - Apr 2026)

Minus Overlay Text Triggering False Positive Ad Detection (Fixed - Apr 2026)

Autonomous Mode False-Pause When Display Disconnected (Fixed - Apr 2026)

Autonomous Mode Navigation During Ads (Fixed - Apr 2026)

OCR 'ad in' Keyword Matching Inside Words (Fixed - Apr 2026)

VLM Degraded State Auto-Recovery (Added - Apr 2026, ROOT CAUSE CORRECTED)

VLMProcess Cross-Thread Race (Fixed - Apr 2026)

A/V Sync Flush Disabled (Apr 2026)

VLM Queue Desync (Off-By-One on Soft Timeout) (Fixed - Apr 2026)

HDMI Restored Recovery Leaves Display Dead Forever (Fixed - Apr 2026)

Phantom Re-Block After Pause-On-Ad (Fixed - Apr 2026)

OCR Worker Keyword-Pattern Drift (Fixed - May 2026)

Unified Debug Toggle + Top-Right OCR Snippet (Added - Apr 2026)