HDMI passthrough with real-time ML-based ad detection and blocking using dual NPUs:
- PaddleOCR on RK3588 NPU (~400ms per frame, 1.0s timeout)
- FastVLM-1.5B on Axera LLM 8850 NPU (~0.9s per frame, 1.5s soft timeout / 5s hard timeout)
- Spanish vocabulary practice during ad blocks!
| Document | Description |
|---|---|
| docs/FEATURES.md | Complete feature list and capabilities |
| docs/ARCHITECTURE.md | System architecture and data flow |
| docs/AESTHETICS.md | Visual design guide for UI/overlays |
| MARISOL.md | AI agent context guide |
| docs/DEBUG_GLITCHES.md | Video glitch debugging notes |
| docs/FPS_DEBUGGING.md | FPS tracking and optimization |
| docs/AUDIO.md | Audio passthrough documentation |
| docs/VLM_NPU_DEGRADATION.md | Investigation of "NPU degradation" — root cause is per-image output-length variance; fix is max_new_tokens cap |
| docs/IR_TRANSMITTER.md | IR transmitter for the REI 8K HDMI switch (PWM3 on pin 38) — wiring, NEC codes, API, troubleshooting |
| docs/IR_RECEIVER.md | IR receiver eval on pin 3 (gpiochip4 11) — bench-tested decode of NEC remotes, gotchas, sketch for a future IRReceiver module |
| docs/STATUS_LEDS.md | WS2812B status strip on SPI0 MOSI (pin 19) — wiring, state catalogue, API, encoding rationale |
See docs/AESTHETICS.md for the complete visual design guide including:
- Color palette (black background, matrix green, danger red, purple accents)
- Typography (VT323 for display, IBM Plex Mono for body, DejaVu for TV overlays)
- Component styling and animations
- TV overlay layout specifications
┌──────────────┐ ┌────────────────────┐ ┌─────────────────────┐
│ HDMI-RX │────▶│ ustreamer │────▶│ GStreamer Pipeline │
│ /dev/video0 │ │ (MJPEG encoding) │ │ (queue + kmssink) │
│ 4K@30fps │ │ │ │ │
│ │ │ :9090/stream │ │ │
│ │ │ :9090/snapshot │ │ │
└──────────────┘ └────────┬───────────┘ └─────────────────────┘
│
▼ HTTP snapshot (~150ms, non-blocking)
┌───────────────┴───────────────┐
│ │
┌────────┴────────┐ ┌──────────┴──────────┐
│ OCR Worker │ │ VLM Worker │
│ ┌───────────┐ │ │ ┌───────────────┐ │
│ │ PaddleOCR │ │ │ │ FastVLM-1.5B │ │
│ │ RK3588 NPU│ │ │ │ Axera LLM 8850│ │
│ │ ~400ms │ │ │ │ ~0.9s │ │
│ └───────────┘ │ │ └───────────────┘ │
└────────┬────────┘ └──────────┬──────────┘
│ │
└───────────────┬───────────────┘
│
┌────────┴────────┐
│ Blocking Mode │
│ (ustreamer API) │
└─────────────────┘
Key Architecture Points:
- Simple GStreamer pipeline with
queue max-size-buffers=3 leaky=downstream - All blocking overlay rendering done in ustreamer's MPP encoder at 60fps
- No X11 required - uses DRM/KMS directly via kmssink
- Auto-detects HDMI output, resolution, and DRM plane at startup
- Works with both 4K and 1080p displays (uses display's preferred resolution)
- Both ML workers run concurrently on separate NPUs
- Display runs independently at 30fps without any stutter
| File | Purpose |
|---|---|
minus.py |
Main entry point - orchestrates everything |
minus.spec |
PyInstaller spec for building executable |
src/ad_blocker.py |
GStreamer video pipeline, blocking API client |
src/audio.py |
GStreamer audio passthrough with mute control |
src/ocr.py |
PaddleOCR on RKNN NPU, keyword detection |
src/ocr_worker.py |
Process-based OCR with hard timeout, warmup, and keepalive |
src/vlm.py |
FastVLM-1.5B on Axera NPU (ad detection + custom queries) |
src/vlm_worker.py |
Process-based VLM with hard timeout, warmup, and keepalive |
src/autonomous_mode.py |
Autonomous mode - VLM-guided YouTube playback |
src/health.py |
Unified health monitor for all subsystems |
src/webui.py |
Flask web UI for remote monitoring/control |
src/fire_tv.py |
Fire TV ADB remote control for ad skipping |
src/roku.py |
Roku ECP remote control |
src/ir_transmitter.py |
NEC IR transmitter over PWM3 (REI 8K HDMI switch). Thread-safe, 1.5 s cooldown |
src/status_leds.py |
Raw WS2812B SPI driver. 8 LEDs, 10% brightness cap, Adafruit-canonical 8-bit-per-WS-bit encoding at 6.4 MHz |
src/status_led_controller.py |
State machine + animation thread on top of status_leds.py. States: off/initializing/idle/blocking/no_signal/autonomous/error |
src/device_config.py |
Streaming device type configuration and persistence |
src/fire_tv_setup.py |
Fire TV auto-setup flow with overlay notifications |
src/wifi_manager.py |
WiFi captive portal and AP mode management |
src/overlay.py |
Notification overlay via ustreamer API |
src/vocabulary.py |
Spanish vocabulary — original SPANISH_VOCABULARY (~550 entries, 4-tuples) plus SPANISH_VOCABULARY_EXTENDED (~200 entries, 5-tuples with two example sentences). VOCABULARY_COMBINED is the unified list the ad overlay iterates. |
src/console.py |
Console blanking/restore functions |
src/drm.py |
DRM output probing, adaptive bandwidth fallback |
src/v4l2.py |
V4L2 device probing (format, resolution) |
src/config.py |
MinusConfig dataclass |
src/capture.py |
UstreamerCapture class for snapshot capture |
src/screenshots.py |
ScreenshotManager with dHash dedup + blank rejection |
src/skip_detection.py |
Skip button detection (regex patterns) |
test_fire_tv.py |
Fire TV controller test and interactive remote |
ir_transmit.py |
Standalone CLI for the IR transmitter (sudo python3 ir_transmit.py <button>) |
tests/test_modules.py |
Comprehensive test suite (300+ tests) |
tests/test_autonomous_mode.py |
Autonomous mode unit tests |
tests/test_review_ui.py |
Playwright UI tests for screenshot review |
tests/test_ir_transmitter.py |
Unit tests for IR transmitter (mocked sysfs, 20 tests) |
tests/test_ir_ui.py |
Playwright UI tests for IR remote panel |
tests/test_status_led_controller.py |
Unit tests for status-LED state machine (mocked hardware, 31 tests) |
tests/test_status_leds_ui.py |
Playwright UI tests for status-LED toggle + state palette |
tests/test_status_led_states.py |
Hardware walk: every controller state across all 8 LEDs, 5 s each |
test_status_leds.py |
Hardware walk/flash test for the WS2812B strip (R/G/B/W) |
tests/test_ocr_ad_detection.py |
OCR ad pattern detection tests (143+ cases) |
src/templates/index.html |
Web UI single-page app |
src/static/style.css |
Web UI dark theme styles |
install.sh |
Install as systemd service |
uninstall.sh |
Remove systemd service |
stop.sh |
Graceful shutdown script |
minus.service |
systemd service file |
screenshots/ads/ |
OCR-detected ads (for training) |
screenshots/non_ads/ |
User paused = false positives (for training) |
screenshots/vlm_spastic/ |
VLM uncertainty cases (for analysis) |
screenshots/static/ |
Static screen suppression (still frames) |
python3 minus.pyCommand-line options:
--device /dev/video1 # Custom capture device
--ocr-timeout 1.5 # OCR timeout in seconds (default: 1.5)
--max-screenshots 100 # Keep N recent screenshots (default: 50, 0=unlimited)
--check-signal # Just check HDMI signal and exit
--connector-id 231 # DRM connector ID (auto-detected if not specified)
--plane-id 192 # DRM plane ID (auto-detected if not specified)
--webui-port 80 # Web UI port (default: 80)Auto-detection at startup:
- Connected HDMI output - Works with either HDMI-A-1 (connector 215) or HDMI-A-2 (connector 231)
- Preferred resolution - Reads EDID to get the display's preferred mode (e.g., 4K@60Hz or 1080p@60Hz)
- NV12-capable overlay plane - Finds a suitable DRM plane that supports NV12 format for video output
- Audio output device - Matches ALSA device to the connected HDMI output (hw:0,0 for HDMI-A-1, hw:1,0 for HDMI-A-2)
This allows Minus to work with different displays without manual configuration.
Adaptive HDMI Bandwidth Fallback:
4K@60Hz RGB/YCbCr 4:4:4 requires 18 Gbps HDMI bandwidth. Some cables, adapters, or display paths can't handle this, resulting in "No Signal" on the TV even though the kernel reports success.
Minus includes adaptive bandwidth detection via src/drm.py:
| Function | Purpose |
|---|---|
get_color_format(connector_id) |
Read current color format (RGB, YCbCr 4:4:4, 4:2:2, 4:2:0) |
set_color_format(connector_id, format) |
Set color format with retry logic |
check_hdmi_i2c_errors(threshold, window) |
Detect signal problems via dmesg |
Detection heuristic: When HDMI signal fails at high bandwidth, the dwhdmi driver floods dmesg with i2c read err! messages. This is more reliable than kernel connector status (which shows "connected" even when signal fails).
Color format values:
COLOR_FORMAT_RGB(0) - Full bandwidthCOLOR_FORMAT_YCBCR444(1) - Full bandwidthCOLOR_FORMAT_YCBCR422(2) - Reduced bandwidthCOLOR_FORMAT_YCBCR420(3) - Half bandwidth (9 Gbps) - use for problematic cables
Manual fallback:
# Stop minus first (it holds DRM master lock)
sudo systemctl stop minus
# Set YCbCr 4:2:0 for half bandwidth
sudo modetest -M rockchip -w 215:color_format:3
# Restart minus
sudo systemctl start minusEnvironment variables:
# Paths (override defaults for different installations)
MINUS_USTREAMER_PATH=/path/to/ustreamer # Default: /home/radxa/ustreamer-patched
MINUS_VLM_MODEL_DIR=/path/to/vlm/models # Default: /home/radxa/axera_models/FastVLM-1.5B
MINUS_OCR_MODEL_DIR=/path/to/ocr/models # Default: /home/radxa/rknn-llm/.../paddleocr
# Timing thresholds
MINUS_ANIMATION_START=0.3 # Blocking animation duration (seconds)
MINUS_ANIMATION_END=0.25 # Unblocking animation duration (seconds)
MINUS_FRAME_STALE_THRESHOLD=5.0 # Health check frame freshness (seconds)
MINUS_DYNAMIC_COOLDOWN=0.5 # Wait after screen becomes dynamic (seconds)
MINUS_SCENE_CHANGE_THRESHOLD=0.01 # Frame difference threshold for scene change
MINUS_VLM_ALONE_THRESHOLD=5 # Consecutive VLM detections needed to trigger alone| Metric | Value |
|---|---|
| Display (video) | 30fps (GStreamer kmssink, MJPEG → NV12 → DRM plane) |
| Display (blocking) | 60fps (ustreamer MPP blocking mode with FreeType) |
| Preview window | 60fps (hardware-scaled in MPP encoder) |
| Blocking composite | ~0.5ms per frame overhead |
| Audio mute/unmute | INSTANT (volume element mute property) |
| ustreamer MJPEG stream | ~60fps (MPP hardware encoding at 4K) |
| OCR latency | 100-200ms capture + 250-400ms inference |
| VLM latency | ~0.9-1.1s per frame (FastVLM-1.5B, process-based with soft/hard timeout) |
| VLM model load | ~30s (includes 4 warmup inferences + keepalive thread) |
| Snapshot capture | ~150ms (4K JPEG download) |
| OCR image size | 960x540 (downscaled from 4K for speed) |
| ustreamer quality | 80% JPEG (MPP encoder) |
| Animation start | 0.3s (fast blocking response) |
| Animation end | 0.25s (fast unblocking) |
FPS Tracking:
- GStreamer identity element with pad probe counts frames
- FPS logged every 60 seconds via health monitor
- Warning logged if FPS drops below 25
We use a patched version of ustreamer from garagehq/ustreamer that adds:
- NV12/NV16/NV24 format support for RK3588 HDMI-RX devices
- MPP hardware JPEG encoding using RK3588 VPU (~60fps at 4K!)
- Blocking mode system with FreeType TrueType rendering for ad blocking overlays
- Extended timeouts for RK3588 HDMI-RX driver compatibility
- Multi-worker MPP support (4 parallel encoders optimal)
- Cache sync fix for DMA-related visual artifacts
- Thread-safe FreeType mutex for multi-worker encoding
Why patched ustreamer? The stock PiKVM ustreamer doesn't support NV12 format or RK3588 hardware encoding. Our fork adds NV12→JPEG encoding via Rockchip MPP (Media Process Platform) that achieves ~60fps on 4K input with minimal CPU usage.
Dynamic Format Detection: Minus automatically probes the V4L2 device to detect its current format and resolution. Supported formats:
- NV12 - RK3588 HDMI-RX native (uses MPP hardware encoder directly)
- NV24 - Some devices like Roku (converted to NV12 for MPP, ~60fps)
- BGR24/BGR3 - Google TV and similar devices (converted to NV12 for MPP, ~42fps at 4K)
- YUYV/UYVY - Webcam-style devices
- MJPEG - Pre-compressed JPEG sources
Format conversions (NV24→NV12, BGR24→NV12) are done in software in the MPP encoder before hardware JPEG encoding.
Performance comparison (4K HDMI input):
| Mode | ustreamer FPS | CPU Usage | Notes |
|---|---|---|---|
| CPU encoding | ~4 fps | ~100% | CPU can't keep up with 4K JPEG encoding |
| MPP hardware | ~60 fps | ~5% | --encoder=mpp-jpeg (default) |
ustreamer command (used by Minus):
/home/radxa/ustreamer-patched \
--device=/dev/video0 \
--format=NV12 \
--resolution=3840x2160 \
--persistent \
--port=9090 \
--host=0.0.0.0 \
--encoder=mpp-jpeg \
--encode-scale=passthrough \
--quality=80 \
--workers=4 \
--buffers=5Installation:
# Clone and build with MPP support
git clone https://github.com/garagehq/ustreamer.git /home/radxa/ustreamer-garagehq
cd /home/radxa/ustreamer-garagehq
make WITH_MPP=1
cp ustreamer /home/radxa/ustreamer-patched
# Minus uses /home/radxa/ustreamer-patched automaticallyKey changes in garagehq/ustreamer:
src/ustreamer/encoders/mpp/encoder.c- MPP hardware JPEG encoder with cache sync, blocking composite, NV24→NV12 and BGR24→NV12 format conversionsrc/libs/capture.c- NV12/NV16/NV24/BGR24 format support, extended timeoutssrc/libs/blocking.c- FreeType text rendering, NV12 compositing, thread-safe mutexsrc/ustreamer/http/server.c- Blocking API endpoints (/blocking,/blocking/set,/blocking/background)src/ustreamer/encoder.c- MPP encoder integration, multi-worker supportsrc/ustreamer/options.c---encoder=mpp-jpegCLI option
Hardware:
- Capture:
hw:4,0(rockchip,hdmiin) - HDMI-RX audio input - Playback:
hw:0,0(rockchip-hdmi0) - HDMI-TX0 output - Format: 48kHz, stereo, S16LE
GStreamer Pipeline:
alsasrc (HDMI) ──┐
├──► audiomixer ──► volume ──► alsasink
audiotestsrc ────┘
(silent keepalive)
The audiotestsrc wave=silence provides a silent keepalive that prevents pipeline stalls when the HDMI source has no audio (between songs, during video silence, etc.).
Mute Control:
ad_blocker.show()callsaudio.mute()- instant mute during adsad_blocker.hide()callsaudio.unmute()- restore audio after ads- Uses GStreamer
volumeelement'smuteproperty (no pipeline restart)
Why separate pipeline?
- Audio runs independently from video - simpler debugging
- If audio fails, video continues unaffected
- No sync issues for live passthrough
Error Recovery:
- GStreamer bus monitors for pipeline errors and EOS
- Buffer probe tracks audio flow (detects stalls)
- Watchdog thread checks every 3s, restarts if no buffer for 6s
- Exponential backoff for restarts (1s → 2s → 4s → ... → 60s max)
- No maximum restart limit - always tries to recover
- Backoff resets after 5 seconds of sustained audio flow
- Mute state is preserved across restarts
Testing:
# Test passthrough manually
gst-launch-1.0 alsasrc device=hw:4,0 ! \
"audio/x-raw,rate=48000,channels=2,format=S16LE" ! \
audioconvert ! audioresample ! \
alsasink device=hw:0,0 sync=false
# Check if HDMI source has audio
v4l2-ctl -d /dev/video0 --get-ctrl audio_presentOCR (Primary - Authoritative):
- Triggers blocking immediately on 1 detection
- Stops blocking after 2 consecutive no-ads (
OCR_STOP_THRESHOLD=2, was 4 — tuned viatests/block_latency_harness.py) - Authoritative for stopping when OCR triggered the block
- Tracks
last_ocr_ad_timefor VLM context - Handles common OCR misreads in ad timestamps (see below)
VLM (Secondary - Anti-Waffle Protected):
- Uses sliding window of last 45 seconds of VLM decisions (
vlm_history_window) - Only triggers blocking alone if 90%+ of recent decisions are "ad" (
vlm_start_agreement) - Hysteresis: needs 100% agreement to START (capped at 95% via
vlm_start_threshold_capso a few stragglers can't block forever) - Minimum 4 decisions in window before VLM can act (
vlm_min_decisions) - 8-second cooldown after state changes prevents rapid flip-flopping (
vlm_min_state_duration) - Sliding window only for starting - stopping uses simple consecutive count (
VLM_STOP_THRESHOLD=2)
Sliding Window Parameters:
| Parameter | Value | Purpose |
|---|---|---|
vlm_history_window |
45s | How far back to look at VLM decisions |
vlm_min_decisions |
4 | Minimum decisions needed before acting |
vlm_start_agreement |
90% | Agreement threshold to start blocking |
vlm_hysteresis_boost |
10% | Extra agreement needed to change state |
vlm_start_threshold_cap |
95% | Maximum effective start threshold (so hysteresis can't make it unreachable) |
vlm_min_state_duration |
8s | Cooldown after VLM state change |
VLM_STOP_THRESHOLD |
2 | Consecutive no-ad votes for fast-stop path |
Transition Frame Detection:
When blocking is active, black/solid-color frames are detected as transitions between ads and held in blocking state to prevent premature unblocking and re-blocking flicker. The _is_transition_frame() method analyzes:
- Mean brightness < 30 with low std deviation → black screen
- Low std deviation across frame → solid color
-
95% pixels within 20 values of median → uniform/static
Starting Blocking:
- OCR detects ad → blocking starts immediately (unless home screen detected)
- VLM detects ad (no OCR) → needs 80%+ agreement in sliding window (4+ decisions)
- VLM with recent OCR → trusted, triggers blocking
- Home screen detection suppresses both OCR and VLM blocking on streaming interfaces
Stopping Blocking:
- If OCR triggered (source=ocr or both): OCR says stop (4 no-ads) → ends immediately (~2-3s)
- If VLM triggered alone (source=vlm): VLM says stop (2 no-ads) → ends (~4s after ad ends)
- VLM history cleared on stop → prevents immediate re-trigger
- VLM stop uses simple consecutive count, NOT sliding window (for responsiveness)
Why This Design:
- VLM sliding window prevents erratic false-positive blocking when acting alone
- OCR is authoritative for stopping OCR-triggered blocks (fast unblock)
- VLM-triggered blocks require VLM to confirm ad ended (since OCR never saw it)
- Clearing VLM history on stop prevents "waffle memory" from causing re-triggers
- VLM stopping uses simple consecutive count (not sliding window) for responsiveness
Anti-flicker:
- Minimum blocking duration starts at 3.0s (
MIN_BLOCKING_DURATION_BASE) and falls off byMIN_BLOCKING_DURATION_STEP(0.5s) on each consecutive ad: 3.0 → 2.5 → 2.0 → 1.5 → 1.0s. Floor is 1.0s for OCR-only, 1.5s for OCR+VLM both agreeing. Counter resets afterMIN_DURATION_RESET_GAP(30s) without a block. Toggleable via Settings → Blocking Optimizations → Block-duration Falloff. - VLM history cleared on stop prevents false re-triggers
- Transition frame detection holds blocking through black screens between ads
- After TV reconnect, ad blocking is suppressed for
HDMI_RECONNECT_GRACE_SECONDS(90s) so the user can navigate without overlays jumping in. The health monitor callsMinus.notify_hdmi_reconnect()when it sees the HDMI-TX link return. Toggleable via Settings → Blocking Optimizations → HDMI Reconnect Grace.
Static Screen Suppression:
- Prevents blocking on paused video screens (Netflix/YouTube show ads when paused)
- After 2.5s of static screen (
STATIC_TIME_THRESHOLD), blocking is suppressed - When video resumes, 0.5s cooldown (
DYNAMIC_COOLDOWN) before re-enabling blocking - Detection state (OCR/VLM) cleared on cooldown complete to prevent false positives
- Static ad screenshots saved to
screenshots/static/for analysis
OCR Timestamp Pattern Handling: OCR frequently misreads characters in ad timestamps. The detection handles these common confusions:
| Intended | OCR Misreads | Example |
|---|---|---|
0 (zero) |
o, O |
"Ad0:30" → "Ado:30", "AdO:30" |
1 (one) |
l, L, I, i |
"Ad1:30" → "Adl:30", "AdI:30" |
: (colon) |
;, . |
"Ad0:30" → "Ad0;30", "Ad0.30" |
Combined misreads are also handled (e.g., "Adl;lo" for "Ad1:10"). The timestamp pattern matches:
- Standard:
Ad 0:30,Ad0:30,Ad1:45 - Zero misreads:
Ado:30,Ad0:3o,Ado:oo,Ado:o5(zeros misread on both sides of the colon) - One misreads:
Adl:30,Ad1:l5,Adl:lo - Separator misreads:
Ad0;30,Ad0.30,Ado;3o
The pattern lives in two places that must stay in sync: src/ocr.py:595 (PaddleOCR class) and src/ocr_worker.py:404 (OCRProcess, which is what production actually calls — self.ocr = OCRProcess() in minus.py:563). Each side carries a Mirrors src/ocr.py:NNN — keep in sync / vice-versa comment. The deeper fix is to delete the duplicate in ocr_worker.py and have it call PaddleOCR.check_ad_keywords directly; until then, any change to one file's pattern must be mirrored to the other. See OCR Worker Keyword-Pattern Drift under Known Issues for the past failure mode.
Ad-keyword policy:
- Bare
Ad/Adsat a word boundary triggers blocking. Past false positives from words likeLoading,reading,Adobeare handled via the word-boundary regex (\bad\b/\bads\b) and theAD_EXCLUSIONSlist — bareAdinside a longer word will not match. Visit advertiser(YouTube pre-roll CTA) is treated as an exact ad keyword.
Fuzzy "Skip Intro" exclusion:
Streaming UIs render a Skip Intro button that OCR sometimes reads as Sk1p Intro, Skip 1ntro, Sk1p 1ntro, Sk1p1ntro, etc. (i ↔ 1 ↔ l ↔ I swaps). A compiled regex s[kK][i1lI]p\s*[i1lI]ntro (in src/ocr.py as SKIP_INTRO_FUZZY_RE and mirrored in src/ocr_worker.py) covers all permutations. It's applied as part of the exclusion gate at the top of the per-text and cross-element matching paths, before either exact-keyword or word-boundary detection runs — important because skip in (inside AD_KEYWORDS_EXACT) is a substring of skip intro and would otherwise match first.
Skip Ad is not excluded — it still triggers ad detection (via the skip ad exact keyword) and is independently recognized as a skip button by src/skip_detection.py, so Minus will press it to dismiss the ad.
When ads are detected, the screen shows a full blocking overlay rendered at 60fps via ustreamer's native MPP blocking mode:
- Pixelated Background: Blurred/pixelated version of the screen from ~6 seconds before the ad
- Header (debug only):
[ BLOCKING // OCR ],[ BLOCKING // VLM ], or[ BLOCKING // OCR+VLM ] - Spanish vocabulary: Random intermediate-level word with translation
- Example sentence: Shows the word in context
- Rotation: New vocabulary every 11-15 seconds
- Ad Preview Window: Live preview of the blocked ad in bottom-right corner (60fps!)
- Debug stats (debug only): bottom-left dashboard with uptime, blocks, time saved, ad countdown bar, audio level
- OCR trigger snippet (debug only): top-right
(Ad) 0:30 leftstyle — the OCR text that fired the block, with the matched keyword wrapped in parens. Empty for VLM-only blocks. Capped at 50 chars.
Multi-color Text Per Line:
- Purple - Spanish word (IBM Plex Mono Bold font)
- White - Header and translation (DejaVu Sans Bold font)
- Gray - Pronunciation and example sentence (DejaVu Sans Bold font)
Font Configuration:
FONT_PATH_VOCAB_PRIMARY= DejaVu Sans Bold (vocabulary text, centered)FONT_PATH_WORD_PRIMARY= IBM Plex Mono Bold (Spanish word, purple)FONT_PATH_STATS_PRIMARY= IBM Plex Mono Regular (debug stats, monospace)
Rendering Pipeline: All overlay rendering is done inside ustreamer's MPP encoder, NOT GStreamer:
ad_blocker.pycaptures pre-ad frame and creates pixelated NV12 background- Background uploaded via
POST /blocking/background(async, non-blocking) - Text and preview configured via
GET /blocking/set - FreeType renders TrueType fonts directly to NV12 planes at encoder resolution
- Composite runs at 60fps with ~0.5ms overhead per frame
Pixelated Background: Instead of a plain black background, the blocking overlay shows a heavily pixelated (20x downscale) and darkened (60% brightness) version of what was on screen before the ad appeared. This provides visual context while clearly indicating blocking is active.
Implementation (src/ad_blocker.py):
- Rolling 6-second snapshot buffer (3 frames at 2-second intervals)
- Uses oldest frame when blocking starts (ensures pre-ad content)
- OpenCV pixelation: downscale by 20x, upscale with INTER_NEAREST
- Converted to NV12 and uploaded via
/blocking/backgroundPOST API - Upload runs in background thread for non-blocking operation
Preview Window: Unlike the old GStreamer approach (limited to ~4fps), the ustreamer blocking mode provides:
- Full 60fps live preview of the blocked ad
- Hardware-accelerated scaling in the MPP encoder
- Automatic resolution handling (works at 1080p, 2K, 4K)
Web UI Toggles: Ad Preview Window and Debug toggleable via Settings (both default ON). The unified Debug toggle controls all three on-screen debug elements together — header, bottom-left stats dashboard, and top-right OCR trigger snippet — and is persisted to ~/.minus_system_settings.json (debug_overlay) so off survives a service restart.
Recursion safety for the OCR snippet: OCR consumes /snapshot/raw (src/capture.py:134), which the patched ustreamer serves from us_blocking_store_raw_frame() before the blocking composite is applied. The new top-right text — and every other element on the blocking overlay — is therefore invisible to OCR, so the displayed (Ad) 0:30 left cannot loop back into detection. Don't break this: if you ever route OCR through /snapshot (the composited path), all of these debug texts will become self-triggering.
120+ intermediate-level words and phrases including:
- Common verbs: aprovechar, lograr, desarrollar, destacar, enfrentar...
- Reflexive verbs: comprometerse, enterarse, arrepentirse, darse cuenta...
- Adjectives: disponible, imprescindible, agotado, capaz, dispuesto...
- Nouns: desarrollo, comportamiento, conocimiento, ambiente, herramienta...
- Expressions: sin embargo, a pesar de, de repente, hoy en dia, cada vez mas...
- False friends: embarazada, exito, sensible, libreria, asistir...
- Subjunctive triggers: es importante que, espero que, dudo que, ojala...
- Time expressions: hace poco, dentro de poco, a la larga, de antemano...
Log File:
- Location:
/tmp/minus.log - Max 5MB per log file
- Keeps 3 backup files (minus.log.1, .2, .3)
Screenshot Truncation:
- Keeps only last 50 screenshots by default
- Configurable via
--max-screenshots
FastVLM-1.5B on Axera LLM 8850 NPU:
- Smarter than 0.5B with fewer false positives on streaming interfaces
- ~0.7s inference time for ad detection (process-based with 1.5s hard timeout)
- ~1.0s for custom queries (structured prompt)
- ~25s model load time (includes 2 warmup inferences)
- Uses Python axengine + transformers tokenizer
- Home screen detection provides additional safety net
Process-based architecture (src/vlm_worker.py):
- VLM runs in a separate process for hard timeout capability
- Uses 'spawn' multiprocessing method to avoid "can only join a child process" errors from axengine
- Soft/Hard timeout strategy to avoid unnecessary restarts:
- Soft timeout (1.5s): Returns immediately with "TIMEOUT", but worker keeps running
- Hard timeout (5.0s): Only kills worker if inference is truly stuck
- Restart threshold: 3 consecutive soft timeouts trigger a hard kill
- Late responses are drained on next request and counters reset
- 4 warmup inferences at startup with varied content (noise, gradients, edges, mixed)
- Keepalive thread runs dummy inference every 20s during idle to prevent NPU cold-start
- Worker process loads model once (~27s), processes requests via Queue
Two inference modes:
detect_ad(image_path)→(is_ad, response_text, elapsed, confidence)— ad/not-ad classification. Internally hard-caps the model atmax_new_tokens=5.query_image(image_path, prompt, max_new_tokens=8)→(response_text, elapsed)— custom prompt for any question about the image (used by Autonomous Mode for screen state classification). Themax_new_tokensdefault of 8 fits the autonomous-mode multi-choice prompt (PLAYING / PAUSED / DIALOG / MENU / SCREENSAVER); raise it explicitly for open-ended prompts knowing latency rises ~0.23 s per allowed token.
Both modes share the same model. Concurrent callers (detection loop calling detect_ad, autonomous mode calling query_image) are serialized by VLMProcess._call_lock so they cannot cross responses on the shared queue or race on the timeout / latency state. See VLMProcess Cross-Thread Race under Known Issues for the full rationale.
The max_new_tokens cap is the load-bearing reason VLM never enters a sustained "restart cycle" anymore. Without it, certain images (visually busy / ambiguous) make the model emit a 30–60 token descriptive paragraph instead of "Yes." / "No.", taking 10–15 s and tripping every downstream timeout. See docs/VLM_NPU_DEGRADATION.md for the investigation that ruled out NPU/firmware/driver causes and isolated the fix.
/home/radxa/axera_models/FastVLM-1.5B/
├── fastvlm_ax650_context_1k_prefill_640_int4/ # LLM decoder models
│ ├── image_encoder_512x512.axmodel # Vision encoder
│ ├── llava_qwen2_p128_l*.axmodel # 28 decoder layers
│ └── model.embed_tokens.weight.npy # Embeddings (float32)
├── fastvlm_tokenizer/ # Tokenizer files
└── utils/ # LlavaConfig and InferManager
Why FastVLM-1.5B instead of 0.5B?
| Aspect | FastVLM-0.5B | FastVLM-1.5B |
|---|---|---|
| Inference Time | 0.7s | 0.9s |
| False Positive Rate | ~88% on home screens | ~36% on home screens |
| Intelligence | Basic | Much smarter |
| Parameters | 0.5B | 1.5B |
The Axera NPU can drift into a degraded state (observed: ~15–18s inference with descriptive responses instead of the structured short answer) that outlasts simple worker restarts. This is not thermal — temps are similar (~70°C) when healthy and when slow. Most likely accumulated NPU memory or axengine context state.
VLMProcess keeps a rolling window of the last 10 successful inference latencies. After each success it computes P95 and triggers recovery if P95 > 3.0s:
| Step | Trigger | Action |
|---|---|---|
| 1 | P95 > 3.0s, no recovery in last 60s | restart() — kill worker + 2s NPU-release + start |
| 2 | Still degraded within 180s of step 1 | Deep restart — kill + 8s release + start |
60s cooldown prevents thrashing. Latency window and recoveries surface on /api/health at subsystems.vlm.latency and as Prometheus gauge minus_axera_temperature_celsius / minus_axera_npu_usage_percent / minus_axera_cmm_used_kib.
Query axcl directly for live telemetry:
axcl-smi info --temp # milli-°C, divide by 1000
axcl-smi info --npu # utilization %
axcl-smi info --cmm # CMM memory used / total# System packages
sudo apt install -y imagemagick ffmpeg curl v4l-utils
# GStreamer and plugins for video pipeline
sudo apt install -y \
gstreamer1.0-tools \
gstreamer1.0-plugins-base \
gstreamer1.0-plugins-good \
gstreamer1.0-plugins-bad \
gstreamer1.0-rockchip1 \
gir1.2-gst-plugins-base-1.0 \
libgstreamer1.0-dev
# Build ustreamer with MPP hardware encoding and FreeType fonts
sudo apt install -y librockchip-mpp-dev libfreetype-dev libjpeg-dev libevent-dev
git clone https://github.com/garagehq/ustreamer.git /home/radxa/ustreamer-garagehq
cd /home/radxa/ustreamer-garagehq && make WITH_MPP=1
cp ustreamer /home/radxa/ustreamer-patched
# Fonts for blocking overlay
sudo apt install -y fonts-dejavu-core fonts-ibm-plex
# Python dependencies
pip3 install --break-system-packages \
pyclipper shapely numpy opencv-python \
pexpect PyGObject flask requests androidtv \
rknnlite # RKNN NPU runtime for OCR (may need Rockchip's pip repo)Note: The rknnlite package is provided by Rockchip and may need to be installed from their SDK or a custom repository. On the Radxa board with NPU support, it may already be pre-installed.
Axera NPU (for VLM): The FastVLM-1.5B model runs on the Axera LLM 8850 NPU. Required Python packages:
pip3 install --break-system-packages axengine transformers ml_dtypesThe axengine package requires the Axera AXCL runtime to be installed - see the Axera documentation.
ustreamer fails to start:
fuser -k /dev/video0 # Kill processes using device
pkill -9 ustreamer # Kill orphaned ustreamerVLM not loading:
- Check Axera card:
axcl_smi - Verify model files exist in
/home/radxa/axera_models/FastVLM-1.5B/ - Ensure Python dependencies:
pip3 show axengine transformers ml_dtypes
OCR not detecting:
- Test snapshot:
curl http://localhost:9090/snapshot -o test.jpg - Check HDMI:
v4l2-ctl -d /dev/video0 --query-dv-timings
Display issues:
- Check DRM plane:
modetest -M rockchip -p | grep -A5 "plane\[72\]" - Verify connector:
modetest -M rockchip -c | grep HDMI
NEVER REVERT TO GSTREAMER TEXTOVERLAY FOR BLOCKING OVERLAYS.
The blocking overlay system uses ustreamer's native MPP blocking mode (/blocking/* API), NOT GStreamer's input-selector or textoverlay. This is a one-way migration - we only move forward.
Current Architecture:
- Simple GStreamer pipeline with
queue max-size-buffers=3 leaky=downstreamfor smooth video - All blocking compositing (background, preview, text) done in ustreamer's MPP encoder at 60fps
- Control via HTTP API:
/blocking/set,/blocking/background - FreeType TrueType font rendering:
- IBM Plex Mono Bold for Spanish word (purple, centered)
- DejaVu Sans Bold for vocabulary text (white/gray, centered)
- IBM Plex Mono Regular for stats dashboard (bottom-left, monospace)
- Per-line multi-color text matching web UI aesthetic (see AESTHETICS.md)
- Thread-safe with mutex protection for 4 parallel MPP encoder workers
Resolution Flexibility: The blocking system automatically handles resolution mismatches:
- API calls may specify 4K dimensions (3840x2160)
- With
--encode-scale passthrough, encoder uses source resolution directly - Preview dimensions are scaled proportionally to fit
- Positions are clamped to valid ranges
- All coordinates aligned to even values for NV12
Thread Safety:
FreeType is NOT thread-safe. With 4 parallel MPP encoder workers, a pthread_mutex_t _ft_mutex serializes all FreeType calls in the composite function to prevent crashes. Without this, concurrent FT_Set_Pixel_Sizes/FT_Load_Glyph calls corrupt FreeType's internal state.
Why NOT GStreamer textoverlay:
- Caused pipeline stalls every ~12 seconds
- NV12 format incompatibility issues
- 4K→1080p resolution mismatch problems
- gdkpixbufoverlay limited to ~4fps for preview updates
- Complex input-selector switching logic
Key files:
ustreamer-garagehq/src/libs/blocking.c- NV12 compositing with FreeType, mutex protectionustreamer-garagehq/src/libs/blocking.h- Blocking mode APIsrc/ad_blocker.py- Python client using blocking API
Notification Overlay (for Fire TV setup messages, etc.):
GET /overlay- Get current overlay configurationGET /overlay/set?params- Set overlay configuration
| Parameter | Description |
|---|---|
text |
Text to display (URL-encoded, supports newlines) |
enabled |
true or 1 to enable overlay |
position |
0=top-left, 1=top-right, 2=bottom-left, 3=bottom-right, 4=center |
scale |
Text scale factor (1-10) |
color_y, color_u, color_v |
Text color in YUV |
bg_enabled |
Enable background box |
bg_alpha |
Background transparency (0-255) |
clear |
Clear overlay |
Example:
curl "http://localhost:9090/overlay/set?text=LIVE&position=1&scale=3&enabled=true"
curl "http://localhost:9090/overlay/set?clear=true"Blocking Mode (for ad blocking overlays):
Blocking Mode Endpoints:
GET /blocking- Get current config (enabled, preview, colors, etc.)GET /blocking/set?enabled=true&text_vocab=...&text_ocr=...&preview_enabled=true&preview_grayscale=true&word_y=140&word_u=175&word_v=145- Configure. Includespreview_grayscaleto desaturate the corner preview,word_y/word_u/word_vfor cycling the Spanish word color per rotation, andtext_ocrfor the top-right OCR-trigger snippet (renders in IBM Plex Mono Regular at the same scale astext_stats; empty string clears it).POST /blocking/background- Upload pixelated NV12 background (widthheight1.5 bytes)
Multi-color text auto-detection: Lines starting with [ → white (header), ( → gray (pronunciation), = → white (translation), " → gray (example), other → purple (Spanish word)
The overlay system includes a priority mechanism to handle multiple overlays gracefully:
Persistent Overlays:
- Setup instructions (Roku Limited Mode, Fire TV ADB Enable) are "persistent"
- Registered with duration > 60 seconds
- Have a background monitor thread that checks every 5 seconds
- Auto-restore if overwritten by short notifications (VLM status, etc.)
Short Overlays:
- Status notifications (VLM Ready, Connected, etc.) are short-lived (5-10s)
- Can temporarily interrupt persistent overlays
- After they expire, the persistent overlay is automatically restored
State Changes:
- Successful device connection calls
_clear_persistent()to dismiss setup instructions - This prevents stale setup overlays from reappearing after connection
Implementation:
- Module-level singleton state in
src/overlay.py(_overlay_statedict) - Monitor thread spawned by
_set_persistent()polls ustreamer overlay API - Compares current overlay text to expected text, restores if different
The health monitor (src/health.py) runs in a background thread and checks:
| Subsystem | Check | Recovery |
|---|---|---|
| HDMI signal | v4l2-ctl --query-dv-timings | Show "NO SIGNAL" overlay, mute audio |
| No HDMI at startup | check_hdmi_signal() | Show bouncing "NO SIGNAL" screensaver |
| ustreamer | HTTP HEAD to :9090/snapshot | Restart ustreamer + video pipeline |
| Video pipeline | Buffer flow + FPS monitoring | Restart pipeline with exponential backoff |
| Output FPS | GStreamer pad probe | Log warning if < 25fps |
| VLM | Consecutive timeouts < 5 | Degrade to OCR-only, retry VLM after 30s |
| Memory | Usage < 90% | Force GC, clear frame buffers |
| Disk | Free > 500MB | Log warning |
HDMI Disconnect/Reconnect Recovery:
- Detects HDMI signal loss via ustreamer's
/stateAPI (captured_fpsfield) - Signal considered lost if FPS is 0 for more than 5 seconds (handles source going to sleep)
- Shows "NO SIGNAL" overlay and mutes audio immediately
- On signal restoration: restarts ustreamer → restarts video pipeline → restores display
- Full recovery typically completes in ~7 seconds
Display Output Resilience (HDMI-TX Disconnected):
- Service continues running even if HDMI-TX display output is disconnected
- ustreamer runs independently for web preview and ML detection
- Web UI shows "DISPLAY DISCONNECTED" overlay with grayscale video feed underneath
- Display retry loop attempts reconnection every 7 seconds (only display pipeline, not ustreamer)
- OCR/VLM ad detection continues working without display output
- API exposes
display_connectedanddisplay_errorfields in/api/status
Video Pipeline Watchdog:
- Buffer watchdog detects stalls (10 seconds without buffer)
- Monitors GStreamer pipeline state (must be PLAYING)
- Handles HTTP connection errors from souphttpsrc
- Handles unexpected EOS (end-of-stream) events
- Exponential backoff for restarts (1s → 2s → 4s → ... → 30s max)
- Backoff resets after 10 seconds of sustained buffer flow
Startup grace period:
- 30-second grace period before ustreamer health checks begin
- Prevents false positives during VLM model loading
Graceful degradation (startup):
- OCR initialization: 3 retries with 2s delay, continues without OCR if all fail
- VLM model loading: 3 retries with 5s delay, continues without VLM if all fail
- Both failures are non-fatal — the system runs with whatever subsystems loaded
ocr_readyandocr_disabledfields in/api/status(matching existingvlm_ready/vlm_disabled)- OCR status badge in web UI header:
OCR: Ready / Disabled / Failed
Graceful degradation (runtime):
- If VLM fails 5+ times consecutively, switches to OCR-only mode
- VLM restart is attempted after 30 seconds in background
- OCR continues working independently
Scene skip cap:
- OCR: Force run after 30 consecutive skips
- VLM: Force run after 10 consecutive skips
- Prevents missing ads that appear without scene change
Periodic logging:
- FPS logged every 60 seconds
- Full status logged every 5 minutes (uptime, fps, hdmi, video, audio, vlm, mem, disk)
Minus includes a lightweight Flask-based web UI for remote monitoring and control, accessible via Tailscale from desktop or mobile devices.
Features:
- Live video feed - MJPEG stream proxied from ustreamer (CORS bypass)
- Status display - Blocking state, FPS, HDMI info, uptime
- Pause controls - 1/2/5/10 minute presets to pause ad blocking
- Detection history - Recent OCR/VLM detections with timestamps
- Settings - Toggle preview window and debug dashboard
- Log viewer - Collapsible log output for debugging
Key API Routes:
GET /,/api/status,/api/detections,/api/logsPOST /api/pause/N,/api/resumeGET/POST /api/preview/*,/api/debug-overlay/*(the debug-overlay route is the unified Debug toggle: header + bottom-left stats + top-right OCR snippet, persisted to~/.minus_system_settings.jsonasdebug_overlay)POST /api/test/trigger-block,/api/test/stop-blockGET /stream,/snapshot- Proxy to ustreamerGET /api/health- Health check for uptime monitorsPOST /api/video/restart- Force restart video pipelineGET/POST /api/video/color- Get/set color settings (saturation, brightness, contrast, hue)POST /api/ocr/test- Run OCR on current frame (no screenshot save)POST /api/vlm/test- Run VLM on current frame (no screenshot save)GET /api/vlm/status- Get VLM status (disabled, model_loaded, etc.)POST /api/vlm/disable- Disable VLM and unload model from NPUPOST /api/vlm/enable- Re-enable VLM and load modelPOST /api/blocking/skip- Trigger Fire TV skip buttonPOST /api/audio/sync-reset- Reset A/V sync (~300ms dropout)GET /api/autonomous- Autonomous mode statusPOST /api/autonomous/enable/disable/toggle/start- Control autonomous modePOST /api/autonomous/schedule- Set schedule (start_hour, end_hour, always_on)GET /api/autonomous/logs- Autonomous mode log entriesGET /api/screenshots/review/<category>- Unreviewed screenshots for swipe classificationPOST /api/screenshots/approve- Mark screenshot as correctly labeledPOST /api/screenshots/classify- Move screenshot between categoriesPOST /api/screenshots/undo- Undo last review actionGET /api/ir/status- IR transmitter status (enabled,available,initialized,codes)POST /api/ir/enable/disable- Toggle the IR remote feature (gates the UI and/command)POST /api/ir/command- Send a captured button. Body:{"button": "power"|"input_1"|"input_2"|"input_3"|"next"|"auto"}.403when disabled,429withretry_afterinside the 1.5 s cooldown. Seedocs/IR_TRANSMITTER.md.GET /api/leds/status- Status LEDs status (available,enabled,running,state,states,last_error,gated)POST /api/leds/enable/disable- Toggle the WS2812B status strip; persists; starts/stops the animation threadPOST /api/leds/state- Switch animation state. Body:{"state": "<name>"}.403when disabled,400for unknown state. States:off / initializing / idle / blocking / paused / no_signal / autonomous / wifi_setup / error. Seedocs/STATUS_LEDS.md.GET /api/leds/require_display- Display-gate status (leds_require_display, livedisplay_connected)POST /api/leds/require_display- Body{"enabled": true|false}— when on (default), the strip stays dark while the HDMI-TX display is disconnected or powered off.
Test API Endpoints: For development and testing ad blocking without waiting for real ads:
# Trigger blocking for 20 seconds (max 60)
curl -X POST -H "Content-Type: application/json" \
-d '{"duration": 20, "source": "ocr"}' \
http://localhost:80/api/test/trigger-block
# Stop blocking immediately
curl -X POST http://localhost:80/api/test/stop-blockParameters for trigger-block:
duration: seconds to block (default: 10, max: 60)source: detection source - 'ocr', 'vlm', 'both', or 'default'kind: optional forced replacement kind - 'vocab', 'fact', or 'photos'
Test mode prevents the detection loop from canceling the blocking, allowing full testing of pixelated background, animations, and audio muting. When source is ocr or both, the endpoint also injects a synthetic (Ad) 0:30 left snippet into the top-right OCR-trigger slot so you can exercise that rendering path without waiting for real OCR.
Access URLs:
- Local:
http://localhost:80 - Tailscale:
http://<tailscale-hostname>:80 - Direct stream:
http://<hostname>:9090/stream
Security:
- No authentication (relies on Tailscale network security)
- Read-mostly API with minimal attack surface
- Binds to 0.0.0.0 for remote access
Minus automatically collects training data for future VLM improvements, organized by type:
Screenshot directories:
screenshots/ads/- OCR-detected adsscreenshots/non_ads/- User paused = false positivesscreenshots/vlm_spastic/- VLM uncertainty cases (detected 2-5x then changed)screenshots/static/- Static screen suppression
Screenshot Quality Filtering (all categories):
Every save goes through _should_save() which applies three layers of filtering:
| Layer | What it catches | Threshold |
|---|---|---|
| Rate limiting | Rapid-fire saves | 5s minimum between saves per category |
| Blank rejection | Black/solid-color frames | Mean brightness < 15 or std dev < 10 |
| dHash dedup | Near-duplicate frames | Hamming distance < 10 bits (~85% similar) |
dHash (Difference Hash):
- Resize frame to 9x8 grayscale, compare adjacent pixels → 64-bit hash
- Two frames of the same ad with slightly different timestamps: hamming distance ~1-5
- A genuinely different scene: hamming distance ~20-30
- Keeps last 200 hashes per category for rolling dedup window
Screenshot Review System (Tinder-style):
The web UI includes a swipe-based review system for classifying screenshots:
- Each screenshot tab (Ads, Non-Ads, VLM Spastic, Static) has a 👀 review button
- Opens a full-screen modal with a 3-card visual stack
- Swipe right (or arrow key) = approve / classify as ad
- Swipe left (or arrow key) = reclassify / classify as not ad
- Undo (Ctrl+Z or button) reverses the last action
- Progress tracked in
/home/radxa/.minus_reviewed_screenshots.json— shows oldest unreviewed first
| Category | Swipe Right | Swipe Left |
|---|---|---|
| Ads | Approved (correct) | Move to Non-Ads |
| Non-Ads | Approved (correct) | Move to Ads |
| VLM Spastic | Move to Ads | Move to Non-Ads |
| Static | Move to Ads | Move to Non-Ads |
Review API:
GET /api/screenshots/review/<category>- Unreviewed items, oldest firstPOST /api/screenshots/approve- Mark as correctly labeledPOST /api/screenshots/classify- Move between categoriesPOST /api/screenshots/undo- Undo last action
Autonomous Mode keeps YouTube playing on streaming devices during scheduled hours so Minus can collect ad detection training data unattended. Device-agnostic design supports Fire TV, Roku, and Google TV. Uses VLM to understand screen state and take intelligent actions.
How it works:
- Schedule — Configurable start/end hours (e.g., 22:00–06:00), or 24/7 mode
- OCR-based screen detection — Before VLM, checks OCR text for login/home screen keywords (VLM often misclassifies these static screens as "PLAYING")
- VLM-guided keepalive — Every 2 minutes, captures a frame and asks VLM to classify the screen state
- Roku ECP active app check — Before VLM, queries Roku's
/query/active-appAPI to detect if YouTube exited or screensaver activated (more reliable than VLM for Roku) - Frame-change + audio verification — After VLM says "PLAYING", verifies with dHash frame comparison + audio flow check to catch paused videos VLM misclassifies
- Smart actions — Based on combined signals, takes the minimum necessary action:
| Signal | Action | Command |
|---|---|---|
| OCR: login screen keywords | Select account | down + select |
| OCR: home screen keywords + static | Select video | down + select |
| VLM: PLAYING + frames changing | None | Video is fine |
| VLM: PLAYING + static + no audio | Play | play_pause (paused video VLM missed) |
| VLM: PLAYING + static + audio flowing | None | Music stream with static image (lo-fi) |
| VLM: PAUSED | Play | play_pause key |
| VLM: DIALOG | Dismiss | select + play_pause |
| VLM: MENU | Select video | down + select |
| VLM: SCREENSAVER | Wake + launch | wakeup + launch YouTube |
| Roku: screensaver overlay | Dismiss | select (wake from screensaver) |
| Roku: not on YouTube | Relaunch | launch_app('youtube') |
Device-agnostic design:
set_device_controller(controller, device_type)accepts any controller- Device type auto-detected from controller class name
- YouTube launch uses device-specific methods (Roku ECP
launch_app, Android ADB intent) - Skip command routes through active device controller
Roku-specific features:
- Active app check via ECP
/query/active-app— definitively knows if YouTube is running - Screensaver detection — checks for
<screensaver>element in active-app response (Roku City screensaver overlays YouTube without closing it) - YouTube app ID: 837
OCR-based screen detection: VLM often misclassifies static YouTube screens (login, home) as "PLAYING". OCR keywords provide more reliable detection:
| Screen | Keywords | Action |
|---|---|---|
| Login/account selection | watch as guest, watchas guest, add a kid account, kid account, choose account, switch account |
down + select to choose account |
| Home/browse | new to you, newtoyou, trending, subscriptions, library, views, year ago, month ago |
down + select to pick a video |
Login screen detection runs before VLM query. Home screen detection runs when VLM says "PLAYING" but frames are static.
Frame-change verification (pause detection):
- dHash (difference hash) compares two frames 3 seconds apart
- Hamming distance < 3 = truly static (paused or stuck)
- Audio flow check via ad_blocker's audio module (
0 <= last_buffer_age < 3s) or ALSA/proc/asoundstatus - Note:
buffer_age = -1means no audio ever received (not flowing), fixed to prevent false "audio flowing" detection - Static frames + audio flowing = music stream (not paused) — prevents false play_pause
- Static frames + no audio = truly paused — sends play_pause after 2 consecutive checks
VLM Screen Query Prompt:
Look at this TV screen and classify it into exactly one category.
Answer with ONLY one of these words:
PLAYING, PAUSED, DIALOG, MENU, SCREENSAVER
This structured prompt returns in ~1.0s (vs 5-22s with descriptive prompts).
Settings persistence: /home/radxa/.minus_autonomous_mode.json
{"enabled": true, "start_hour": 22, "end_hour": 6, "always_on": false}System settings: /home/radxa/.minus_system_settings.json
{"vlm_preload": true}VLM preload loads the model at startup before HDMI signal arrives (configurable in Settings tab).
API endpoints:
GET /api/autonomous- Current status (active, schedule, stats, device_type, device_connected)POST /api/autonomous/enable/disable/togglePOST /api/autonomous/start- Start immediately (manual override)POST /api/autonomous/schedule- Set hours and always_on flagGET /api/autonomous/logs- Recent log entriesGET/POST /api/settings/vlm-preload- VLM preload toggleGET/POST /api/settings/optimization- Toggle block-duration falloff, HDMI reconnect grace, and greyscale preview. POST body:{"key": "block_falloff"|"hdmi_reconnect_grace"|"greyscale_preview", "enabled": true|false}. Persisted to~/.minus_system_settings.json. Settinggreyscale_previewhere propagates to the running ad_blocker immediately via/blocking/set?preview_grayscale=...so the current block updates on the fly.GET/POST /api/settings/replacement-modes- Which content kinds the blocking overlay rolls into. POST body:{"modes": ["vocab","fact","haiku","photos"]}. Server enforces at least one text kind (vocab/fact/haiku) remains enabled. Persisted with the rest of system settings.GET/POST /api/media/photos- List all uploaded photos (GET) or upload a new one (POST multipart withfilefield). Server re-encodes to JPEG (max 1920px long edge, quality 85) under~/.minus_media/photos/. Count cap 200, size cap 200 MB (oldest evicted on add).GET/DELETE /api/media/photos/<id>- Download JPEG bytes inline (GET) or remove by id (DELETE). Id is sanitized to hex to prevent path traversal.
Web UI: Toggle button, schedule time selectors, 24/7 checkbox (auto-enables mode), stats display in Settings tab, VLM preload toggle.
24h stability test results (Apr 10-11, 2026):
- Memory: stable at ~1.65GB RSS, no leak (tested 21+ hours continuous)
- FD count: stable at ~35, no leak
- Autonomous actions: 10+ DIALOG dismissals, 6+ screensaver auto-dismissals, all successful
- Ads blocked: 15+ ad breaks (OCR+VLM), all legitimate
- Audio-aware static detection: prevented 100+ false play_pause commands on lo-fi streams
- Audio restarts: 3 total (isolated, all self-recovered)
- Zero errors throughout
Minus includes a WiFi captive portal system for easy network configuration when no WiFi is connected.
How it works:
- If WiFi disconnects for 30+ seconds, Minus creates a "Minus" hotspot AP
- Users connect to the hotspot and get redirected to a setup page
- Setup page shows available networks with signal strength
- User selects network and enters password
- Minus connects and stops the AP automatically
Hotspot Configuration:
- SSID:
Minus - Password:
minus123 - IP:
10.42.0.1 - Band: 2.4GHz (802.11 b/g)
Captive Portal Detection: The portal supports automatic detection on mobile devices:
GET /generate_204- Android captive portal checkGET /hotspot-detect.html- Apple captive portal checkGET /connecttest.txt- Windows captive portal check
API Endpoints:
GET /api/wifi/status- Current connection status, AP mode stateGET /api/wifi/scan- Scan for available networksPOST /api/wifi/connect- Connect to a network (ssid, password)POST /api/wifi/disconnect- Disconnect from current networkPOST /api/wifi/ap/start- Start AP mode manuallyPOST /api/wifi/ap/stop- Stop AP modeGET /wifi-setup- Captive portal setup page
Settings Tab Integration: The Settings tab in the web UI shows:
- Current WiFi status (SSID, IP, signal strength)
- Disconnect button for current network
- Manual AP mode start/stop buttons
Files:
src/wifi_manager.py- WiFi/AP management modulesrc/templates/wifi_setup.html- Captive portal pagetests/test_wifi_portal.py- Playwright tests (30 tests)
Note: The Radxa's internal WiFi antenna has limited range. For better AP coverage in production, consider using a USB WiFi adapter with external antenna.
An IR LED wired to Rock Pi 5B header pin 38 (GPIO3_B2 / Linux GPIO 106, muxed to PWM3_IR_M1) lets Minus control a REI 8K 3-port HDMI switch. The target use case is autonomous mode rotating between streaming devices (Roku / Fire TV / Google TV) on a schedule so training data covers multiple home-screen layouts.
Hardware setup (one-time): enable the rk3588-pwm3-m1 overlay, reboot. After reboot a new /sys/class/pwm/pwmchipN appears whose device symlink points to fd8b0030.pwm. See docs/IR_TRANSMITTER.md for overlay install steps and wiring.
Protocol: NEC at 38 kHz carrier. Captured codes (all address 0x80, via Flipper Zero): input_1=0x07, input_2=0x1B, input_3=0x08, power=0x05, next=0x1F (cycles 1→2→3→1), auto=0x09.
API: /api/ir/status | enable | disable | command. See the Web UI Key API Routes section above. Server enforces a 1.5 s cooldown between successful sends (IRCooldownError → HTTP 429 with retry_after).
UI: toggle + 6-button remote (Input 1/2/3, Power, Next, Auto) inside the Autonomous Mode section of the Settings tab. Panel hidden until toggled on. Buttons auto-disable during cooldown and a status line shows sent power or cooldown — wait 0.74s.
Standalone CLI: sudo python3 ir_transmit.py <button> sends one button; --list prints all valid names. Uses the same IRTransmitter class as the webui so there is one source of truth.
Key gotchas (the ones that burned us once already):
- The Radxa pinout labels GPIO3_B2 with the RK3588 pin-function
PWM3_IR_M1, not PWM14. Only therk3588-pwm3-m1overlay wires pin 38. - On a fresh PWM export,
polaritydefaults toinversedon this chip. That flips mark/space at the LED.IRTransmitter.initialize()setspolarity=normalwhile the PWM is disabled, before enabling. - Writing to
duty_cyclereturnsEINVALwhileperiodis still 0. Always setperiodbeforeduty_cycleon a fresh export.
Files:
src/ir_transmitter.py—IRTransmitterclass, NEC encoder, cooldown, PWM sysfs wiringir_transmit.py— standalone CLI shimminus.py— instantiatesself.ir_transmitter, persistsir_enabledin~/.minus_system_settings.jsonsrc/webui.py—/api/ir/*endpoints, cooldown → 429src/templates/index.html— toggle + remote panel in Autonomous Mode sectiontests/test_ir_transmitter.py— 20 unit tests (mocked sysfs)tests/test_ir_ui.py— Playwright UI tests (live service)docs/IR_TRANSMITTER.md— full hardware, protocol, API, and troubleshooting docs
Future work: hook minus.ir_transmitter.send("next") into autonomous mode's scheduler on a 12 h or 24 h cadence. The boilerplate (flag, endpoints, UI, cooldown) is in place so the autonomous-mode change is a single call site.
A 3-pin IR receiver (TSOP38238 / VS1838B class) was evaluated on header pin 3 (GPIO4_B3 / gpiochip4 line 11, Linux GPIO 139). Decoded the REI remote's NEC frames cleanly — 0x80 / 0x07,1B,08,1F plus REPEAT codes — using gpiomon + a Python decoder in test_ir_receiver.py. No production code yet, just exploratory.
Why pin 3 instead of pin 38 (alongside the transmitter): the rk3588-pwm3-m1 overlay parks pin 38's pad-mux on PWM3 at boot. gpiomon will claim the line but the GPIO controller is electrically disconnected from the pad — gpioget reads a constant 0 and no edges fire. Pin 3 / GPIO4_B3 has no overlay claiming it, so default GPIO mux applies and it Just Works. Sanity check: gpioget gpiochip4 11 returns 1 with the receiver powered and idle.
Two gotchas burned dev time, captured here so we don't re-discover:
gpiomon -B bothis invalid in libgpiod 1.6 —-Bis bias, not edge. Default already monitors both edges; pass nothing.- After a falling edge the line is LOW (a MARK), not a SPACE. Get the polarity backwards and every frame appears to start with a
~4500/~600 µs"leader" because the real 9 ms leader mark gets filtered by the empty-buffer guard.
Status: test script only. Decoder is a copy-able starting point if/when we want a real IRReceiver module — see docs/IR_RECEIVER.md for the full sketch including threading model, suggested API surface, and integration ideas (closed-loop transmitter verification, external hardware trigger, remote learning, post-send confirmation for autonomous-mode scheduling).
Files:
test_ir_receiver.py— standalone bench-test script (gpiomon subprocess + NEC decoder +--rawmode for non-NEC remotes)docs/IR_RECEIVER.md— findings, gotchas, future-module sketch
8× WS2812B addressable strip on header pin 19 (GPIO1_B2 muxed as SPI0_MOSI_M2). All 8 LEDs are user-addressable.
Why SPI MOSI: WS2812B's 800 kHz protocol needs sub-µs timing. Userspace GPIO can't deliver that on Linux; the SPI controller can. We clock SPI at 6.4 MHz and encode each WS bit as one full SPI byte (0b11110000 = WS-1, 0b11000000 = WS-0) — the canonical Adafruit NeoPixel_SPI pattern. The frame is wrapped with 80 µs zero-byte resets on both sides and sent via writebytes2(bytes). We initially tried 3-SPI-bits-per-WS-bit at 2.4 MHz; the spi-rockchip driver's PIO mode inserts inter-byte gaps when its FIFO refills, and the tighter scheme didn't have enough skew tolerance — visible symptom was "solid green decoded as cycling red/blue/white". rpi_ws281x and friends depend on Broadcom PWM+DMA hardware and don't work on RK3588.
Hardware: bare-wire data line direct from header pin 19 to the strip — no level shifter, no inline resistor, no bulk cap needed for reliable operation on this board (verified: removed both the Adafruit-recommended 470 Ω series resistor and the 1000 µF V+/GND electrolytic, decoding stayed clean across all 8 LEDs). Keep the data wire ≤ 10 cm. (We previously shipped a "sacrificial first pixel" workaround that exposed only 7 LEDs; the encoding switch made it unnecessary and it has been removed.)
Brightness cap (load-bearing): BRIGHTNESS = 0.10 is applied inside set_pixel() before storage; every other setter funnels through it. Caps peak draw at ~48 mA across all 8 LEDs — small enough to keep current swings from corrupting the data line on the marginal 3.3V signalling. Don't bypass it from the application layer; if you need more brightness, add external 5 V power to the strip first.
Controller (src/status_led_controller.py): StatusLEDController runs a 200 ms (5 fps) animation thread. Each renderer self-paces in seconds via the shared _to_ticks() helper, so per-animation cadence is preserved if the global tick rate is changed. State transitions are atomic and thread-safe; the lifecycle holds _thread populated until join() returns so a racing start() can't open a second SPI handle.
State catalogue:
| State | Visual | Trigger |
|---|---|---|
off |
dark | feature disabled |
initializing |
white pulse 1% → 10% → 1% (1 step/500 ms; 14 s/breath) | Minus.run() start, HDMI restoration |
idle |
solid green | ad_blocker.start(), ad_blocker.hide(), recovery complete |
blocking |
bouncing red Cylon eye + 2-pixel tail (~200 ms/step) | ad_blocker.show(...) |
paused |
slow yellow breathing (3 s) | Minus.pause_blocking(...) |
no_signal |
slow amber breathing (4 s) | _on_hdmi_lost, start_no_signal_mode |
autonomous |
slow blue breathing (4 s) | autonomous-mode active callback |
wifi_setup |
cyan alternating sweep (~250 ms/swap) | WiFi AP-mode started |
error |
fast red blink (2 Hz) | manual / subsystem failure |
Persistence: the on/off toggle is in ~/.minus_status_leds.json. State itself is runtime-only and gets re-asserted by the next event.
Display gating: by default the strip stays dark while the HDMI-TX display is disconnected or powered off — keeps a dark room dark when the TV is off. State machine still ticks; only the wire output is suppressed, so animations resume seamlessly within ~200 ms of the display coming back. Implemented as an optional drive_predicate on the controller that Minus wires to health_monitor._check_hdmi_output_connected(). The leds_require_display flag (default True) toggles the gate from the WebUI; persisted in ~/.minus_system_settings.json.
Hardware setup (one-time): enable rk3588-spi0-m2-cs0-spidev overlay, install python3-spidev, add user to spi group, reboot. ./install.sh does all of that idempotently.
API endpoints: see the Web UI section above.
Files:
src/status_leds.py— raw SPI driver, brightness cap, encodingsrc/status_led_controller.py— state machine + animation thread + persistenceminus.py— instantiatesself.status_leds, wires_set_led_statehelper, hooks_on_hdmi_lost/_on_hdmi_restoredsrc/ad_blocker.py— calls_set_led_statefromshow()/hide()/start()/start_no_signal_mode()src/webui.py—/api/leds/*endpointssrc/templates/index.html— toggle + state palette in Autonomous Mode sectiontest_status_leds.py— hardware walk/flash testtests/test_status_led_controller.py— 26 unit tests (mocked hardware)tests/test_status_leds_ui.py— Playwright UI tests (live service)docs/STATUS_LEDS.md— full docs
Future work: per-LED subsystem indicators (OCR / VLM / audio / HDMI / wifi / autonomous each get one LED), one-shot detection-event flashes, automatic autonomous state on autonomous-mode entry/exit.
Minus supports multiple streaming device types with device-specific remote control:
Supported Devices:
| Device | Protocol | Status |
|---|---|---|
| Fire TV | ADB over WiFi | Full support |
| Roku | ECP (External Control Protocol) | Full support |
| Google TV / Android TV | ADB over WiFi | Full support |
| Apple TV | MRP/AirPlay | Coming soon |
| Generic | None | Ad blocking only |
Web UI Setup: The Remote tab provides a device selector where users can:
- Select their streaming device type
- Follow device-specific setup instructions
- Scan for devices on the network (Fire TV, Roku, Google TV)
- Manually enter device IP address
- Connect and control their device
Device Configuration Persistence:
- Configuration stored in
~/.minus_device_config.json - Persists device type, IP address, and setup state
- Survives service restarts
API Endpoints:
GET /api/device/config- Get current configurationGET /api/device/types- List available device typesPOST /api/device/select- Select a device typePOST /api/device/ip- Set device IP addressPOST /api/device/setup-complete- Mark setup completePOST /api/device/reset- Reset configuration
Roku API Endpoints:
GET /api/roku/status- Connection status and device infoGET /api/roku/discover- Scan network via SSDP multicastPOST /api/roku/connect- Connect to Roku by IPPOST /api/roku/command- Send remote commandPOST /api/roku/launch/<app>- Launch app (youtube, netflix, etc.)
Roku Features:
- Discovery via SSDP multicast
- ECP commands over HTTP to port 8060
- Control mode detection (Limited vs Full)
- Supports all navigation, media, and volume controls
- App launching: YouTube, Netflix, Prime, Disney+, Hulu, Plex, HBO, Peacock
Fire TV API Endpoints:
GET /api/firetv/status- Connection statusGET /api/firetv/scan- Scan network for Fire TV devicesPOST /api/firetv/connect- Connect to Fire TV by IPPOST /api/firetv/command- Send remote command
Google TV / Android TV API Endpoints:
GET /api/googletv/status- Connection statusGET /api/googletv/scan- Scan network for devices (port 5555)POST /api/googletv/connect- Connect by IP:PORT (Wireless debugging uses dynamic port)POST /api/googletv/command- Send remote command (includesassistantfor Google Assistant)
Google TV Setup Notes:
- Uses "Wireless debugging" (not USB debugging) for network ADB
- Settings > System > Developer options > Wireless debugging
- Shows IP:PORT on TV screen when enabled (e.g., 192.168.1.100:37421)
- Enter the full IP:PORT in web UI Remote tab
- First connection requires approving the ADB dialog on TV
Minus can control Fire TV devices over WiFi via ADB for ad skipping and playback control.
Auto-setup: Fire TV is automatically discovered and connected 5 seconds after Minus starts. First-time connection requires approving the ADB authorization dialog on the TV screen (OCR detects when it appears). ADB keys are saved for future connections.
Features:
- Auto-discovery of Fire TV devices on local network
- Verification that discovered device is actually a Fire TV
- ADB key generation and persistent storage for pairing
- Auto-reconnect on connection drops
- Full remote control: play, pause, select, back, d-pad, etc.
- Async-compatible interface
Requirements:
- Fire TV must have ADB debugging enabled
- First connection requires approving RSA key on TV screen
- Both devices must be on the same WiFi network
Enabling ADB on Fire TV: Settings > My Fire TV > Developer Options > ADB Debugging ON (enable Dev Options first via About > click device name 7x)
Testing: python3 test_fire_tv.py [--setup|--interactive|--scan|IP]
Commands: Navigation (up/down/left/right/select/back/home), Media (play/pause), Volume, Power
Usage: quick_connect() → skip_ad() / go_back() → disconnect()
Setup States: idle → scanning → waiting_adb_enable → waiting_auth → connected
Minus can control Google TV and Android TV devices over WiFi via ADB's Wireless debugging feature.
Setup Flow:
- Select "Google TV / Android TV" in the Remote tab
- On-screen overlay guides you through enabling Wireless debugging
- Enter the IP:PORT shown on your TV's Wireless debugging screen
- Approve the connection dialog on your TV
Key Differences from Fire TV:
- Uses "Wireless debugging" instead of "ADB debugging" (USB debugging)
- Dynamic port (not fixed 5555) - must enter IP:PORT format
- Found in Developer options after enabling developer mode
Enabling Wireless Debugging:
- Settings > System > About > click Build number 7 times
- Go back to System > Developer options
- Turn ON "Wireless debugging"
- Note the IP address and port shown on screen
Commands: Same as Fire TV plus assistant for Google Assistant button
Setup States: Same as Fire TV: idle → scanning → waiting_adb_enable → waiting_auth → connected
Color correction is done via GStreamer's videobalance element in the pipeline.
Why not ustreamer/V4L2?
The HDMI-RX device doesn't support V4L2 image controls (saturation, contrast, brightness).
Only read-only controls are available: audio_sampling_rate, audio_present, power_present.
Default settings (in src/ad_blocker.py):
videobalance saturation=1.25 brightness=0.0 contrast=1.0 hue=0.0
Web UI Controls: Color settings can be adjusted in real-time via the Settings tab in the web UI:
- Saturation: 0.5-1.5 slider (default 1.25, higher = more vivid)
- Brightness: -0.5 to 0.5 slider (default 0.0)
- Contrast: 0.5-1.5 slider (default 1.0)
- Hue: -0.5 to 0.5 slider (default 0.0)
API Endpoints:
# Get current color settings
curl http://localhost/api/video/color
# Set color settings (any combination)
curl -X POST -H "Content-Type: application/json" \
-d '{"saturation": 1.3, "brightness": 0.1}' \
http://localhost/api/video/colorGStreamer ranges (for advanced use):
saturation: 0.0-2.0 (default 1.0)contrast: 0.0-2.0 (default 1.0)brightness: -1.0 to 1.0 (default 0.0)hue: -1.0 to 1.0 (default 0.0)
# Install
sudo ./install.sh
# View logs
journalctl -u minus -f
# Stop
sudo systemctl stop minus
./stop.sh # Alternative with optional X11 restart
# Uninstall
sudo ./uninstall.shThe service:
- Starts on boot (
multi-user.target) - Conflicts with display managers (gdm, lightdm, sddm)
- Restarts on crash (5 attempts per 5 minutes)
- Runs as root for DRM/device access
Finding the Root Cause is ESSENTIAL:
- Do NOT implement band-aid fixes that mask symptoms without understanding the cause
- Investigate WHY something is failing, not just WHAT is failing
- Example: If audio restarts constantly, don't just limit restart attempts - find out WHY it's restarting
- Use logs,
/procfilesystem, API responses, and system state to trace the actual problem - A fix that doesn't address root cause will likely cause other issues or recur
Test Fixes BEFORE Pushing:
- After implementing a fix, TEST it immediately by observing actual behavior
- Focus testing specifically on the ORIGINAL PROBLEM - verify the symptom is gone
- Do NOT push fixes without verification - iterate until the fix demonstrably works
- Run prolonged tests (30-60 seconds minimum) to catch intermittent issues
- Watch for the specific symptom that was reported (e.g., "frame jumps every 2-3 seconds")
Testing Methodology:
- Understand the symptom clearly (what exactly is failing and how often)
- Identify potential causes through log analysis and code review
- Implement a fix targeting the root cause
- Restart the service and observe behavior
- Check logs for the specific error patterns that were occurring
- Run a prolonged test (60 seconds) watching for the original symptom
- Only commit/push after confirming the symptom is resolved
Verification Techniques:
- Check logs:
sudo journalctl -u minus --since "60 seconds ago" | grep -E "error|restart|fail" - Check FPS:
curl -s http://localhost/api/status | jq .fps - Check ALSA status:
cat /proc/asound/card*/pcm*/sub*/status - Check pipeline state: API responses, GStreamer state queries
- Record video samples for visual issues:
ffmpeg -i http://localhost:9090/stream -t 10 test.mp4
Common Pitfalls to Avoid:
- Limiting retry attempts instead of fixing why retries are needed
- Assuming a fix works without observing the system under the original conditions
- Pushing multiple untested changes at once (makes debugging harder)
- Not checking if the "fix" introduced new problems
Git commits:
-
Do NOT add "Co-Authored-By" lines to commits
-
Do NOT add "Generated with Claude Code" lines to commits
-
Keep commit messages clean and professional - just the message, no AI attribution
-
Do NOT create v2, v3, v4 files - update existing files directly
-
VLM uses Python axengine for inference (not pexpect/C++ binary)
-
Both NPUs run in parallel without resource contention
-
No X11 required - pure DRM/KMS display
-
Color correction via GStreamer videobalance (not V4L2 controls)
-
Health monitor runs every 5 seconds in background thread
-
VLM frame files use PID-based naming to avoid permission conflicts
-
Snapshots scaled to 960x540 before OCR (model uses 960x960 anyway, smaller = faster)
-
ustreamer quality set to 80% for balance of quality and CPU load
-
FPS tracked via GStreamer identity element with pad probe
-
Startup cleanup removes stale frame files and kills orphaned processes
-
Background upload is async to prevent blocking main thread
-
Animation times optimized: 0.3s start, 0.25s end for fast response
-
DYNAMIC_COOLDOWN reduced to 0.5s for faster ad detection
pip3 install pyinstaller
pyinstaller minus.spec
# Output: dist/minusNote: Models are external and must be present at runtime.
The project includes a comprehensive test suite for all extracted modules.
Running Tests:
python3 tests/test_modules.py # 300+ unit tests
python3 tests/test_autonomous_mode.py # Autonomous mode tests
python3 tests/test_recent_features.py # Recent feature tests
python3 tests/test_block_decision_engine.py # Blocking state-machine regressions
python3 tests/test_review_ui.py # Playwright UI tests (requires chromium)
python3 tests/test_ir_transmitter.py # IR transmitter unit tests (mocked sysfs)
python3 tests/test_ir_ui.py # Playwright UI tests for IR remote panel
python3 tests/test_status_led_controller.py # Status LED state-machine tests (mocked hardware)
python3 tests/test_status_leds_ui.py # Playwright UI tests for status-LED panelBlock-latency test harness (tests/block_latency_harness.py):
Headless rig for tuning the blocking decision engine. Plays Big Buck Bunny
in a Python loop, lets the test orchestrator inject "AD"-style overlay
text on/off at controlled timestamps, and measures detect / recover
latency end-to-end through the production OCR + VLM workers + a faithful
mirror of minus.py's blocking decision logic. No HDMI, no ustreamer,
no DRM, no audio.
# Place a video file at /home/radxa/test_assets/bbb.mp4 first.
python3 tests/block_latency_harness.py round1 # 9 detect/recover combos
python3 tests/block_latency_harness.py round4 # realistic ad-break shapes
python3 tests/block_latency_harness.py round5 # VLM state machine (injected verdicts)
python3 tests/block_latency_harness.py round6 # user-bug pause-on-ad regression
python3 tests/block_latency_harness.py round7 # production-shaped, OCR + VLM corroborateduse_real_vlm=False mode uses injected VLM verdicts so the engine's
sliding-window state machine can be driven deterministically without the
~30s real-VLM model load. Override PARAMS from a small wrapper script
to A/B-test tuning candidates; the in-rig defaults mirror the locked-in
production values.
Test Coverage:
| Module | Test Class | Tests |
|---|---|---|
src/vocabulary.py |
TestVocabulary | Format validation, content checks, common words |
src/config.py |
TestConfig | Dataclass defaults, custom values |
src/skip_detection.py |
TestSkipDetection | Pattern matching, countdown parsing, edge cases |
src/screenshots.py |
TestScreenshots | Deduplication, file saving, truncation |
src/console.py |
TestConsole | Console blanking/restore commands |
src/capture.py |
TestCapture | Snapshot capture, cleanup |
src/drm.py |
TestDRM | DRM probing, fallback values |
src/v4l2.py |
TestV4L2 | V4L2 format detection, error handling |
src/overlay.py |
TestOverlay | NotificationOverlay, positions, show/hide |
src/health.py |
TestHealth | HealthMonitor, HealthStatus, HDMI detection |
src/fire_tv.py |
TestFireTV | Controller, key codes, device detection |
src/vlm.py |
TestVLM | VLMManager, response parsing, 4-tuple returns |
src/ocr.py |
TestOCR | Keywords, exclusions, terminal detection |
src/webui.py |
TestWebUI, TestWebUIExtended | Flask routes, all API endpoints |
src/ad_blocker.py |
TestAdBlocker, TestAdBlockerExtended | Blocking modes, color controls, animations |
src/audio.py |
TestAudio, TestAudioExtended | A/V sync, pipeline controls, mute/unmute |
src/fire_tv.py |
TestFireTV, TestFireTVExtended | Connection, commands, device discovery |
src/vlm.py |
TestVLM, TestVLMExtended | Response parsing, confidence detection |
src/ocr.py |
TestOCR, TestOCRExtended | Keywords, exclusions, terminal detection |
src/skip_detection.py |
TestSkipDetection, TestSkipDetectionExtended | Pattern matching, countdown parsing |
src/screenshots.py |
TestScreenshots, TestScreenshotsExtended | Deduplication, categories, truncation |
src/config.py |
TestConfig, TestConfigValidation | Defaults, custom values |
src/health.py |
TestHealth, TestHealthExtended | Monitoring, callbacks, status |
src/overlay.py |
TestOverlay, TestOverlayExtended | Positions, show/hide, text formatting |
src/drm.py |
TestDRM, TestDRMExtended | DRM probing, fallback values |
src/v4l2.py |
TestV4L2, TestV4L2Extended | Format detection, error handling |
src/console.py |
TestConsole, TestConsoleExtended | Console blanking/restore |
src/capture.py |
TestCapture, TestCaptureExtended | Snapshot capture, cleanup |
| Integration | TestIntegration | Cross-module tests |
| Memory | TestMemoryLeaks | Resource cleanup, executor reuse |
| Blocking | TestBlockingModeIntegration | State transitions, API format |
| Error Handling | TestErrorHandling | Missing subsystems, graceful failures |
| Concurrency | TestConcurrency | Thread safety, locks |
| Vocabulary | TestVocabulary, TestVocabularyContent | Format, content, duplicates |
| API Responses | TestAPIResponseFormats | Consistent response structure |
src/vlm.py |
TestVLMQueryImage | Custom prompt queries, error paths |
src/ocr.py |
TestOCRResilience | NPU failure handling, graceful degradation |
src/screenshots.py |
TestScreenshotDedup | dHash, blank rejection, rate limiting, per-category |
| Memory | TestMemoryManagement | Hash buffer caps, resource cleanup |
| HDCP | TestHDCPHandling | Encrypted frame handling, blank frame rejection |
src/autonomous_mode.py |
TestAutonomousMode | Schedule, VLM actions, state, persistence (separate file) |
| Review UI | TestReviewModal* | Playwright: desktop/mobile swipe, modal, API (separate file) |
Test Design:
- Tests are self-contained with temporary directories
- Mock subprocess calls to avoid system dependencies
- Fallback to manual test runner if pytest not installed
- All 300+ tests should pass on a clean system
- Playwright tests require chromium:
python3 -m playwright install chromium
The codebase has been refactored from monolithic files into smaller, focused modules:
Extracted from minus.py:
src/console.py- Console blanking functions (blank_console,restore_console)src/drm.py- DRM probing (probe_drm_output)src/v4l2.py- V4L2 probing (probe_v4l2_device)src/config.py- Configuration dataclass (MinusConfig)src/capture.py- Snapshot capture (UstreamerCapture)src/screenshots.py- Screenshot management (ScreenshotManager)src/skip_detection.py- Skip button detection (check_skip_opportunity)
Extracted from ad_blocker.py:
src/vocabulary.py- Spanish vocabulary list (SPANISH_VOCABULARY)
Benefits:
- Easier to test individual components
- Better code organization and discoverability
- Reduced file sizes (minus.py ~1700 lines, ad_blocker.py ~950 lines)
- Clear separation of concerns
Previous problem: Adding a textoverlay element to the GStreamer video path caused pipeline stalls every ~12 seconds due to NV12 format incompatibility and 4K→1080p resolution mismatch.
Solution implemented: Text overlay is now rendered directly in ustreamer's MPP encoder via the blocking mode API. This:
- Composites directly on NV12 frames in the encoder
- Has minimal CPU impact (~0.5ms per frame)
- Works at any resolution without GStreamer pipeline changes
- Supports pixelated background, live preview window, and text overlays
- Uses FreeType for proper TrueType font rendering
Issue: Long-running sessions (several hours) could accumulate memory due to RKNN inference output buffers not being explicitly released.
Solution implemented:
- RKNN inference outputs are now explicitly copied and dereferenced in
src/ocr.py - Periodic
gc.collect()runs every 100 OCR frames and every 50 VLM frames - Health monitor triggers emergency cleanup at 90% memory usage
- Frame buffers (
prev_frame,vlm_prev_frame) are cleared during memory critical events
ThreadPoolExecutor fix (Jan 2026):
- CRITICAL: The OCR worker was creating a new
ThreadPoolExecutoron every iteration, causing massive file descriptor and memory leaks (~12GB after 12 hours) - Fixed by creating a single
ocr_executorbefore the loop and reusing it - Symptom: "Too many open files" errors, display goes blank, memory exhaustion
Memory monitoring:
- Health monitor checks memory every 5 seconds
- Warning logged at 80% usage
- Critical cleanup triggered at 90% usage
Status: Fire TV auto-setup is ENABLED with notification overlays working via ustreamer API.
Startup timing:
- Fire TV setup starts 5 seconds after service start (runs in parallel with VLM loading)
- Total time from start to connection: ~13 seconds (5s delay + ~8s scan/connect)
Bug fixed: Auth retry interval was 3 seconds, causing multiple auth dialogs on the TV before user could respond. Fixed to 35 seconds (longer than AUTH_TIMEOUT of 30s) in fire_tv_setup.py.
Symptom: Frame jumps every 2-3 seconds due to constant GStreamer audio pipeline restarts.
Root Cause: When HDMI signal was restored, resume_watchdog() tried to create a new audio pipeline without:
- Checking if the existing pipeline was already working
- Cleaning up the old pipeline first
This caused the new pipeline to fail with "device in use" because the old pipeline still held the ALSA device. The watchdog then repeatedly tried to restart every 3 seconds.
Why band-aid fixes don't work: Initially tried limiting restart attempts, but this just disabled audio after 5 restarts instead of fixing the underlying issue. The correct approach was to find WHY restarts were happening.
Solution implemented:
- Added
_is_alsa_device_running()helper that checks/proc/asound/cardX/pcmYp/sub0/statusto verify if ALSA device is actually running with our PID - This is more reliable than GStreamer state queries when PipeWire/WirePlumber is involved
- Modified watchdog loop to skip restarts when ALSA confirms audio is flowing
- Modified
resume_watchdog()to check if pipeline is already PLAYING before restart - Added proper cleanup of old pipeline before creating new one
Key insight: The /proc/asound status showed the device was RUNNING with minus as owner, proving audio WAS working. GStreamer state queries were unreliable due to PipeWire interference, but the kernel-level ALSA status was authoritative.
Symptom: After a brief HDMI signal loss (even 8 seconds), the video pipeline stalls every ~12 seconds with mpp_buffer: check buffer found NULL pointer from mpp_dec_advanced_thread. Restarting the GStreamer pipeline alone doesn't help - MPP stays stuck.
Root Cause: The RK3588 MPP JPEG decoder holds resources that don't get properly freed when the GStreamer pipeline is destroyed. After the HDMI source briefly drops and recovers, the decoder enters a corrupt state that persists across pipeline restarts.
Solution implemented:
- After 3+ consecutive pipeline failures, the system now kills ustreamer (
pkill -9 ustreamer) to force-release MPP resources - The health monitor detects ustreamer is down and restarts it + the video pipeline with clean MPP state
- This auto-recovers from stuck MPP decoder without manual service restart
Symptom: No audio after TV wakes up from standby. Audio pipeline starts on wrong HDMI output (e.g., hw:0,0 instead of hw:1,0).
Root Cause: When the display retry loop detects a DRM output change (TV connected to different HDMI port than at boot), it updated the config but not the audio object's playback device. Audio would start on the old device.
Solution implemented:
- Display retry loop now checks if
drm_info['audio_device'] != self.audio.playback_device - If changed, stops the audio pipeline and updates the playback device before restarting
- Ensures audio always matches the active HDMI output
Symptom: Netflix ads showing "Ad 10", "Ad 5" (countdown timer format) were not detected by OCR.
Root Cause: Existing OCR patterns only matched "Ad X of Y" format. Netflix uses standalone "Ad NN" where NN is seconds remaining.
Solution: Added regex pattern ^ad\s*\d+$ to match the countdown format.
Symptom: After successfully skipping an ad, the blocking overlay stayed for 2-3+ seconds waiting for OCR to detect the ad was gone.
Solution: After a successful skip command (auto or manual via web UI), blocking is now removed after a 1.5s delay instead of waiting for 3 OCR cycles. The delay allows the skip animation to complete, then force-unblocks by resetting all detection state. Skip command is device-agnostic — routes to Fire TV (skip_ad()), Roku (send_command('select')), or Google TV based on the configured device type.
Symptom: After running for 12+ hours with no HDMI signal, the web server becomes unresponsive. Logs show [Errno 24] Too many open files errors. The service cannot open new files or sockets.
Root Cause: When the no-signal or loading GStreamer pipelines failed to start, the cleanup code did not remove the bus signal watch before destroying the pipeline. Each failed attempt leaked a file descriptor from bus.add_signal_watch(). With retries every 10 seconds, the 1024 FD limit was reached in ~3 hours.
Solution: Added proper bus cleanup in all pipeline failure paths:
# Before destroying failed pipeline:
if self.bus:
self.bus.remove_signal_watch()
self.bus = NoneFixed in src/ad_blocker.py: start_no_signal_mode() and start_loading_mode() failure paths and exception handlers.
Symptom: After TV/display sleeps for several hours and wakes up, there is no audio output even though the health monitor reports audio=OK and the ALSA device shows state: RUNNING.
Root Cause: The GStreamer audio pipeline runs in a separate thread. When the display sleeps, this thread can crash or die (e.g., due to ALSA device disconnection), but:
- The Python
AudioPassthroughobject retains a stale reference to the dead pipeline - The ALSA device shows
owner_pidpointing to the dead thread's PID - The health check only queries the Python GStreamer state, not the actual ALSA device ownership
- Result: Health reports
audio=OKwhile no actual audio is flowing
Detection: Check if the ALSA playback device's owner_pid corresponds to a live process:
# Get owner PID
cat /proc/asound/card1/pcm0p/sub0/status | grep owner_pid
# owner_pid : 179247
# Check if process exists
ps -p 179247
# Returns empty = zombie audio state!Solution: Enhanced _check_audio_pipeline() in src/health.py to:
- Read the ALSA device status from
/proc/asound/cardX/pcm0p/sub0/status - Verify the
owner_pidcorresponds to a live process (check/proc/{pid}/exists) - If owner is dead but device shows RUNNING, trigger full
_restart_pipeline()(not just queue flush) - 10-second cooldown after any restart before zombie detection runs again (prevents restart loops)
- Skip zombie detection if restart is already in progress
- This runs every health check cycle (5 seconds), so recovery happens automatically
Files modified:
src/health.py- Added_check_alsa_zombie_state()method with full restart and cooldown logic
Symptom: Ad blocking would flicker on/off during ads because OCR sometimes reads "Ad 0:42" (with space) and sometimes "Ad0:42" (no space) or "Ado:55" (OCR misreads '0' as 'o').
Root Cause: The OCR pattern used word boundaries (\bad\b) which required a space between "Ad" and the timestamp. When OCR dropped the space, the pattern didn't match, counting as "no ad". After 3 "no ads", blocking ended, then immediately re-triggered when a frame with space was detected.
Solution: Updated src/ocr.py to match OCR variants:
ad[0o]:pattern catches "Ad0:" and "Ado:" (no space, or 'o' misread)[0-9o]:\d{2}timestamp pattern handles 'o' misread as '0'- Both per-element and cross-element checks updated
Test cases now matched:
Ad 0:42- standard format ✓Ad0:42- no space ✓Ado:55- OCR misread '0' as 'o' ✓0:30 | Ad- Hulu style ✓
Symptom: After TV restart/power cycle, the GStreamer pipeline reports "No-signal display started successfully" but the TV shows its own "HDMI 1 No Signal" message (meaning no video signal from RK3588).
Root Cause: When the TV restarts, the HDMI hotplug event is detected and the sysfs status changes from "disconnected" to "connected", but the HDMI PHY (physical layer) doesn't properly reinitialize. The DRM connector shows as connected, but no actual video signal is being transmitted.
Discovery: Physically unplugging and replugging the HDMI cable made the display work, indicating the HDMI PHY needed reinitialization that wasn't happening on TV restart.
Solution: Force HDMI PHY reinitialization via DPMS (Display Power Management Signaling) cycle:
- When TV reconnects, health monitor detects status change and waits 2s for link stabilization
- DPMS Off (value 3) sent via
modetest -M rockchip -w {connector}:DPMS:3 - Wait 300ms
- DPMS On (value 0) sent via
modetest -M rockchip -w {connector}:DPMS:0 - This forces the HDMI transmitter to reinitialize, equivalent to cable replug
Implementation:
src/health.py- Health monitor detects TV reconnection and callsad_blocker.restart(hdmi_reconnect=True)src/ad_blocker.py-_restart_pipeline(hdmi_reconnect=True)does:- Stop existing pipeline
- DPMS cycle via
_force_hdmi_reinit() - Re-probe DRM to detect connector/plane changes
- Restart audio pipeline (required after TV power cycle)
- Start new video pipeline
- For no-signal mode, DPMS cycle is done in
start_no_signal_mode()directly
Key heuristics for detecting working vs broken state:
| Heuristic | Working | Broken (needs DPMS) |
|---|---|---|
| sysfs status | connected | connected |
| sysfs dpms | On | On |
| Video output | Visible | TV shows "No Signal" |
Note: All sysfs values look identical in both states - the only difference is whether video is actually being transmitted. The DPMS cycle is applied preemptively on every TV reconnection.
Symptom: Audio cuts out every ~15 seconds with logs showing "Audio zombie state detected - GStreamer playing but ALSA owner dead" followed by constant pipeline restarts.
Root Cause: The ALSA owner_pid in /proc/asound/cardX/pcm0p/sub0/status is actually a thread ID (TID), not a process ID (PID). The zombie detection code was checking /proc/{owner_pid} which doesn't exist for threads - threads are listed under /proc/{main_pid}/task/{tid} instead.
Solution: Updated _check_alsa_zombie_state() in src/health.py to check both locations:
- First check
/proc/{owner_pid}(works if it's a PID) - If not found, check
/proc/{main_pid}/task/{owner_pid}(works if it's a TID)
This prevents false zombie detection when the audio thread is actually alive and healthy.
Symptom: Screen stuck on "Initializing..." for 20+ minutes. GStreamer pipeline in restart loop (37+ attempts). ustreamer is capturing video correctly but display pipeline fails.
Root Cause: The Fire TV notification overlay shows "Ad skipping enabled." which contains the word "ad". When OCR read this overlay text, it triggered false positive ad detection. This activated the blocking mode, which caused MPP pipeline errors (mpp_buffer: check buffer found NULL pointer).
Why overlay is visible to OCR: The notification overlay is composited at the ustreamer encoder level BEFORE the snapshot, so /snapshot/raw includes overlay text. This is by design for the preview window in blocking mode, but it means OCR sees everything on screen including our overlays.
Solution: Added our overlay messages to the OCR exclusion lists:
'ad skipping enabled','ad skipping','adskipping'added toAD_EXCLUSIONSin bothsrc/ocr.pyandsrc/ocr_worker.py
Files modified:
src/ocr.py- Added Minus overlay exclusionssrc/ocr_worker.py- Added Minus overlay exclusions
Symptom: Running autonomous mode with HDMI-TX disconnected, music videos with static album art were being paused by autonomous mode every 20 seconds, interrupting legitimate playback.
Root Cause: With display disconnected, the audio pipeline's alsasink can't open HDMI-TX, so the pipeline never receives buffers (last_buffer_age == -1). _is_audio_flowing() returned False. On music videos with static art (hamming≈0), the pause detector concluded "static frames + no audio = PAUSED" and sent play_pause, actually pausing content that was playing.
Solution: Added _is_audio_pipeline_available() in src/autonomous_mode.py. When the audio pipeline has never received a buffer or its state is stopped, treat audio as "unknown" rather than "not flowing". _is_screen_static() returns False in that case so autonomous mode does not assume paused. VLM's direct PAUSED verdict still triggers play.
Symptom: During real ads on YouTube, autonomous mode would fire down + select commands thinking it was on the home screen, navigating through the ad UI and occasionally switching to a different video.
Root Cause: HOME_SCREEN_KEYWORDS in src/autonomous_mode.py contained 'sponsored' and 'views'. "Sponsored · Visit advertiser" on YouTube pre-roll ads and "347M views" in any video's info panel both matched, triggering the home-screen action path.
Solution:
- Removed
'sponsored'and'views'fromHOME_SCREEN_KEYWORDS. - Added
AD_ONLY_KEYWORDS('visit advertiser','send to phone','skip in','skip ad') — if any are present, skip home-screen detection. - Added
ad_blocker.is_visibleguard — if blocking is active, never classify as home screen. - In
minus.py, added a secondary audio-aware guard: if the OCR match is only'sponsored'and HDMI-INaudio_present=0, suppress the block. Real video ads transmit audio; home-screen sponsored tiles usually don't. Uses new_hdmi_audio_present()helper reading v4l2-ctl directly so it works even when our playback pipeline is down.
Symptom: False ad blocks triggered when OCR read "LOADING" or "reading" on screen.
Root Cause: AD_KEYWORDS_EXACT in src/ocr_worker.py contained 'ad in'. The alphanumeric-normalized form is 'adin' (4 chars), which appears as a substring in 'loading' (loading), 'reading' (reading), and similar words.
Solution: Removed 'ad in' from exact keywords. The specific patterns for "Ad N of M", "Ad N" countdown, and "ad with timestamp" (in both ocr.py and ocr_worker.py) already cover legitimate ad timestamps.
Symptom: After several hours of runtime, VLM inference degrades from ~0.7s to ~15–18s per query and returns descriptive responses to short-answer prompts. Each DISCARDED (>2s) entry makes the system effectively OCR-only. Not thermal — temperatures stayed around 70°C both when healthy and when slow.
Original solution (kept as defense-in-depth): Rolling latency window + auto-recovery in src/vlm_worker.py:
_record_latency()/_maybe_auto_recover()called after each successful inference.- If P95 over the last 10 queries exceeds 3.0s, trigger a worker restart.
- If a prior recovery happened within the last 3 minutes and we're degraded again, escalate to a deep restart with 8s NPU-release backoff.
- 60s cooldown prevents thrash.
get_latency_stats()exposes samples/P50/P95/max via/api/healthundersubsystems.vlm.latency.
Axera telemetry (axcl-smi info --temp / --npu / --cmm) is wired into /api/health at subsystems.vlm.axera and exposed as Prometheus gauges minus_axera_* for alerting on temperature or memory pressure.
docs/VLM_NPU_DEGRADATION.md) confirmed:
- Latency is deterministically image-dependent, not a state that drifts in over time.
- Per-token decode rate is constant (~0.23 s/tok); the slow inferences are slow because the model generates 30–60 tokens of descriptive response instead of 1–3 tokens of
Yes./No.. - The NPU, axcl driver, and Axera firmware are all healthy throughout.
axcl-smi rebootandrmmod+modprobeof the host modules do not change behavior on the same image.
Real fix: Cap max_new_tokens at the model layer (5 for detect_ad, 8 for query_image). With the cap, worst-case latency drops from ~12 s to ~1.3 s and the entire restart-cycle pathology goes away. The auto-recovery logic above stays in as defense-in-depth for any genuine NPU pathology, but in normal operation it should never fire.
Symptom: Intermittent too many values to unpack (expected 2) from [AutonomousMode] VLM screen query failed, plus a sustained worker restart cycle (~15–40 hard kills per 15 min) that the soft/hard timeout logic could not damp on its own.
Root Cause: VLMProcess.detect_ad (called from the detection-loop thread) and VLMProcess.query_image (called from the autonomous-mode thread) shared the same request/response multiprocessing.Queue with no request-to-response correlation and no lock around the queue or the shared state (_consecutive_timeouts, _pending_response, _recent_latencies). When both threads called concurrently:
- A
detect_ad4-tuple response could beget()-ed by thequery_imagecaller (which expected a 2-tuple) — and vice versa — producing the unpack error. - Concurrent mutation of
_consecutive_timeoutsand_pending_responseproduced spurious threshold trips, triggering hard kills the system did not actually need. Each hard kill cost ~25s of model reload, during which more queued requests timed out, perpetuating the cycle.
Solution:
- Added
self._call_lock = threading.Lock()toVLMProcess.__init__. - Refactored
detect_adandquery_imageinto thin wrappers that acquire the lock, then delegate to_detect_ad_locked/_query_image_lockedwith the original logic. - This serializes the two callers across the entire request → response cycle, so cross-pollinated responses cannot happen and the shared timeout state stays consistent.
Upstream's tuple-shape defensive guards (introduced in commit 7c42e80) are kept as belt-and-suspenders — they tolerate a stale leaked response if one ever does slip through. The lock prevents the leak; the guards handle it if prevention fails.
Why a lock and not separate queues / request IDs: simplest correct fix that is local to VLMProcess. Detection-loop calls are ~4 Hz and complete in ~0.7s; autonomous-mode calls are once per 2 minutes and complete in ~1.0s. The lock contention is negligible in practice. A dedicated request-ID protocol would be cleaner but invasive to both worker and callers.
Files modified:
src/vlm_worker.py—_call_lock,_detect_ad_locked,_query_image_locked
Symptom: Every 45 minutes of uptime, the AudioPassthrough watchdog ran its periodic sync-queue flush; Sync queue flushed was always followed ~12s later by Pipeline issue detected: not in PLAYING state (paused) and a full Restarting pipeline (attempt N). Cumulative effect: ~32 spurious audio restarts per day, each a brief dropout. The feature that was supposed to prevent restarts was causing them.
Root cause (investigated in 4 failed fix iterations): the flush itself is unrecoverable without a full pipeline rebuild on this pipeline configuration.
flush-startevent puts the sync queue and downstream into flushing mode.flush-stopshould resume streaming, butsyncqueuehasmin-threshold-time=300msthat blocks downstream reads until the queue has refilled past the threshold.- While the queue is blocking,
alsasink— having no data to consume — closes its PCM device. ALSAstatetransitions out ofRUNNING,hw_ptrgoes to 0. set_state(PLAYING)on the pipeline cannot bringalsasinkback up because the upstream queue is still blocked. The pipeline gets stuck inPAUSEDfor 10+ seconds until the watchdog gives up and restarts the whole pipeline.
Attempts that did not work:
- Same-iteration
continueafter flush (commit3b4e0d0) — subsequent iterations still trip on the lingeringPAUSED. - 10-second post-flush "grace window" — flush recovery takes longer than that.
- Explicit
pipeline.set_state(PLAYING)with bounded 2s wait —get_statereturnsPAUSEDregardless. - Temporarily zeroing
syncqueue.min-threshold-timeacross the flush + 400ms refill sleep —alsasinkhad already dropped the PCM by then.
Solution: flip self._sync_reset_enabled = False in AudioPassthrough.__init__ (see src/audio.py:151). The periodic flush never runs, so it can never cascade. Drift isn't a real concern in this pipeline (provide-clock=false on alsasrc, sync=false on alsasink) and 48+ hours of runtime without a working flush showed no observable A/V desync.
To find the commit that made this change: git log --all --oneline --grep='disable periodic A/V sync flush' (commit subject is stable across amends).
Kept as a side-benefit of the investigation: rewrote _is_alsa_device_running() to sample hw_ptr across a 50ms window instead of comparing ALSA's owner_pid to the main process PID. The old check compared an ALSA-reported thread TID (often a stale one) against the main PID, so it could never return True under normal operation. The watchdog's "GStreamer reports PAUSED but ALSA is flowing — skip restart" rescue path has always been broken; now it works.
If drift becomes a real problem in the future (easy revert):
- Write a flush mechanism that does not let
alsasinkclose its PCM device — either by pausing→flushing→playing the whole pipeline in one atomic block, or by replacingsyncqueuewith an element that doesn't block onmin-threshold-time. - Only after (1) works, flip
_sync_reset_enabledback toTrueinsrc/audio.py. - Re-run the soak test (
_sync_interval = 2.5 * 60+ 5-min monitor for ~45 min) and confirmaudio.restart_countstays at 0.
Do NOT simply flip _sync_reset_enabled back to True without (1). The bug will return.
Files modified:
src/audio.py—_sync_reset_enabled = False+ explanatory block comment;_is_alsa_device_running()rewrite
Symptom: After pausing on an ad on Netflix and unpausing, the blocking overlay stayed for ~20 seconds applied against frames where the show was clearly playing again. Other variants: VLM verdicts persistently lagging actual screen content by one frame; rare reports of "Ad 1:30 left" claims long after a real ad had ended.
Root cause: VLMProcess._detect_ad_locked and _query_image_locked (src/vlm_worker.py) shared the same MP request/response queues with no per-request correlation. When VLM hit a soft timeout (1.5s, ~15% of inferences in normal load), the request stayed in flight and _pending_response was set to True. On the next call:
- The drain attempt was a single
get(timeout=0.1). If the worker had not yet pushed its response (still mid-inference), drain timed out. - The code then fell through and
put-ed a NEW request anyway. - Now two requests were in flight. Worker finished the first → pushed result A → caller's
get(SOFT_TIMEOUT)received result A as the answer for request B. - The queue was now permanently off-by-one. Every subsequent
get()returned the prior frame's verdict. - After a pause-on-ad (where the queue accumulated several "ad" verdicts during the pause), the entire backlog was delivered against post-unpause "show is playing" frames → 10–20 seconds of phantom blocking.
The shared /dev/shm/minus_vlm_frame_<pid>.jpg path made it worse — the file was always the most recently written frame, so even the worker's view of "what was frame N" could be stale.
Solution:
- Drain ALL stale responses at function entry using a
get_nowait()loop (was a singleget(timeout=0.1)). - If a request is genuinely still in flight after draining, do NOT queue another. Return
"PENDING"(or"KILLED"after RESTART_THRESHOLD consecutive pendings). Caller treats this exactly like the existing"TIMEOUT"skip path —is_ad=False,confidence=0.0, no-op on the sliding window.
This guarantees only one request is ever in flight, which incidentally also fixes the file-content race because the worker dequeues and reads the file in tight succession.
Files modified:
src/vlm_worker.py—_detect_ad_lockedand_query_image_lockedrewritten with multi-drain + don't-double-queue
Symptom: TV stays frozen on a stale frame for hours. Web app shows live content. subsystems.video.status reads error/reason: no_pipeline. fps_capture is healthy (~42 fps) but fps_display is essentially 0. Service uptime can be many hours; restart is the only recovery.
Root cause (two coupled defects in Minus._on_hdmi_restored() at minus.py:634):
-
ad_blocker.start()'s return value was ignored. When HDMI input recovers but HDMI-OUT (the TV) is still disconnected, kmssink can't open the DRM plane andstart()returnsFalse. The recovery handler proceeded as if all was well and logged[Recovery] HDMI recovery complete. -
self.display_connectedwas left stuck atTrue. The display retry loop (_start_display_retry_loop) is the only thing that can recreate a dead pipeline post-startup, but it gates onnot self.display_connected. Since recovery never set the flag toFalseon failure, the retry loop never ran. Pipeline stayed dead until the next service restart.
Observed once today across a 17-hour run: at 08:19:59 HDMI input recovered after a 550-second loss while the TV was off. Recovery declared success. Attempting to reconnect display pipeline log line count for the entire 17-hour run: 0. The display sat frozen on its last decoded frame all the way until a manual restart at 12:46.
Solution: check start()'s return value; on failure, set display_connected=False, populate display_error, and call _start_display_retry_loop(). The retry loop already exists and works correctly — it just needs to be armed. Audio remains paused/muted on the failure path; it'll be resumed by the normal start path inside the retry loop when the pipeline finally comes up.
Why the failure mode is sticky without this fix: there is no other code path that ever flips display_connected from True back to False post-startup. Initial startup (minus.py:3000) is the only place. The retry loop never fires because its gate (not self.display_connected) stays False.
The NO SIGNAL behavior is unchanged: the HDMI-LOST path still calls start_no_signal_mode() and the health monitor's "Continuous NO SIGNAL mode enforcement" loop still re-triggers it whenever HDMI input is absent. So the desired "TV shows NO SIGNAL when input is gone" behavior is preserved end-to-end.
Files modified:
minus.py—_on_hdmi_restored(): capture return value, branch on success/failure, arm retry loop on failure
Symptom: User pauses on a real ad on Netflix, ad ends offscreen during the pause, user unpauses to actual show content — and Minus shows the blocking overlay for ~5 more seconds on the show content. Reproduced via the block-latency harness (round6): with the OLD parameters, 3/3 scenarios observed phantom re-blocks at ~0.9s after unpause.
Root cause: three coupled defects in the static-suppression / cooldown machinery, each individually plausible but combining badly:
OCR_STOP_THRESHOLD = 4meant blocking took 4 OCR cycles × 0.5s = 2s to clear once the ad ended — already over the 1.5s responsiveness target the user wanted.scene_change_threshold = 0.01misclassified ~26% of natural low-motion frames in real video content as "static" (measured against BBB's actual inter-frame mean-abs-diff distribution: p5=0.002, p50=0.017, max=0.31). Static suppression therefore flapped on/off mid-content during slow scenes.dynamic_cooldown = 0.5swas too short for the post-pause AD overlay to actually transition off-screen. The cooldown completed → state was cleared → the very next OCR cycle re-detected the still-lingering AD text → blocking re-fired immediately.
The user only saw symptom 3 in the worst form (the phantom re-block), but symptoms 1 and 2 amplified its visibility — symptom 2 was also responsible for the related "blocking flips off mid-content" issue earlier in the same investigation.
Solution: three coordinated tuning changes, locked in via tests/block_latency_harness.py measurements (rounds 1, 4, 6, 7):
| Parameter | Old | New | Effect |
|---|---|---|---|
OCR_STOP_THRESHOLD (minus.py) |
4 | 2 | recover 2.0s → 1.0s |
scene_change_threshold (config.py) |
0.01 | 0.001 | only truly-frozen frames (~1.7% of BBB) register as static; natural low-motion content (~98%) keeps flowing |
dynamic_cooldown (config.py) |
0.5s | 1.5s | post-pause AD overlay actually finishes transitioning off-screen before state is cleared |
Verification: round6 of the harness re-runs the user's scenario 3× per parameter set:
- OLD params: 3/3 phantom re-blocks, max 0.90s after unpause
- NEW params: 0/3 phantom re-blocks ✓
Final scenario performance with locked-in params:
- detect: mean 0.59s, max 0.66s, 9/9 clean across all round-1 ad shapes
- recover: mean 0.97s, max 1.15s, all under 1.5s goal
- 0 false-positive blocking events across 15s of clean content (round 7)
- 0 mid-block flaps across a 30s sustained ad (round 7)
Defense-in-depth: tests/test_block_decision_engine.py adds 11 lightweight unit tests for the DecisionEngine state machine (cooldown clearing, OCR stop threshold, VLM-only fast-stop, the user-bug regression itself with both OLD and NEW params asserted). Runs as part of the standard test suite.
Files modified:
minus.py—OCR_STOP_THRESHOLD = 2+ comment with link to harnesssrc/config.py—scene_change_threshold = 0.001+ measurement-derived comment,dynamic_cooldown = 1.5(already changed in the cooldown-fix commit earlier this session)tests/block_latency_harness.py— new ~700-line headless harness (BBB source, OCR/VLM workers, decision-engine mirror, 7 rounds of scenarios)tests/test_block_decision_engine.py— new 11 unit tests
Symptom: During real ad breaks, blocking flapped on/off every 5–15 seconds even though OCR was reading the ad timer cleanly every frame. Logs showed sequences like:
00:29:00 [BLOCKING OCR] - Ad 1:11 ← match (boundary)
00:29:01 [BLOCKING OCR] - RATED TV-MA ← no_ad #1
00:29:03 OCR #62188 - Ad1:09 ← no_ad #2 — silently!
00:29:03 OCR: ad no longer detected (after 2 no-ads)
00:29:03 AD BLOCKING ENDED after 3.1s
00:29:08 - Ad 1:02 → AD BLOCKING STARTED again
OCR's text output was literally the running ad timer, but check_ad_keywords was returning ad_detected=False, so the no-ad counter incremented and tripped OCR_STOP_THRESHOLD=2.
Root cause: there are two check_ad_keywords implementations — src/ocr.py:515 on the PaddleOCR class, and src/ocr_worker.py:310 on OCRProcess. Production wires self.ocr = OCRProcess() in minus.py:563, so OCRProcess.check_ad_keywords is what actually runs. The two have drifted: ocr.py was updated months ago to handle the OCR-drops-the-space variant ("Ad1:09") and looser separator/digit misreads, but ocr_worker.py was never updated.
The drifted worker pattern was:
# src/ocr_worker.py (pre-fix) — ONLY matches when there's a word boundary after "ad"
if re.search(r'\bad\b', text_lower) and re.search(r'[0-9o]:[0-9o]{2}', text_lower):
matched.append(('ad with timestamp', text))\bad\b requires a non-word char after d. "Ad1:09" puts a digit (word char) right after d, so the boundary doesn't exist and the pattern fails. The timestamp side was also stricter: [0-9o] only (no l/I/i), and : only (no ;/.).
So every frame OCR'd as Ad1:09 was silently a no-ad, and a streaming service that briefly replaces the timer with a rating card ("RATED TV-MA") at ad-to-ad transitions was enough to chain two consecutive no-ads and trip the unblock — even though the same ad break was still running.
Fix: src/ocr_worker.py:404 and the cross-element check at src/ocr_worker.py:418 now mirror src/ocr.py:595 exactly:
has_ad = (re.search(r'\bad\b', text_lower)
or re.search(r'ad[0-9oOlIi][:;.]', text_lower))
has_timestamp = re.search(r'[0-9oOlIi][:;.][0-9oOlIi][0-9oOlIi]', text_lower)
if has_ad and has_timestamp:
matched.append(('ad with timestamp', text))Verified against actual log samples: Ad1:09, Ad 1:11, Ad0:30, Ado:30, Adl:l0, Ad1:02, Ad0:55 all match; bare Ad, RATED TV-MA, loading, reading correctly do not.
Both files now carry a Mirrors src/ocr.py:NNN — keep in sync comment to make the next drift visible at the patch site.
Why this isn't just a tighter mirror: the duplication exists at all because OCRProcess runs check_ad_keywords locally in the main process (it's just string matching, no NPU work) instead of in the worker subprocess where PaddleOCR lives. Deleting the duplicate would require either (a) importing PaddleOCR from ocr.py into ocr_worker.py and calling its method, or (b) sending ocr_results back into the worker for keyword check. (a) is the right fix and a small refactor — open task for next session. Until then, the mirror comments are the guardrail.
Files modified:
src/ocr_worker.py— per-element + cross-element keyword patterns updated; mirror comments addedCLAUDE.md— OCR Timestamp Pattern Handling section now calls out the dual-source requirement
The blocking overlay grew a third debug element: a top-right (Ad) 0:30 left snippet showing the OCR text that triggered the block, with the matched keyword wrapped in parens. The existing Debug Dashboard settings toggle was unified into a single Debug toggle that gates three things together: the [ BLOCKING // ... ] header (top), the bottom-left stats dashboard, and this new top-right OCR snippet.
Persistence: the toggle is a system setting (debug_overlay, default True) in ~/.minus_system_settings.json. Pushed into ad_blocker.set_debug_overlay_enabled() at startup so off survives a service restart.
Recursion concern (resolved by existing architecture): the natural worry is that putting the OCR trigger text back on screen would make OCR keep seeing "Ad" forever. That cannot happen because OCR consumes /snapshot/raw (src/capture.py:134), which the patched ustreamer serves from us_blocking_store_raw_frame() before the blocking composite runs (ustreamer-garagehq/src/ustreamer/http/server.c:1026). The new top-right text — and every other element on the blocking overlay — is therefore invisible to OCR. Do not change OCR to read /snapshot (the composited path) without first stripping the debug texts; otherwise the displayed snippet becomes self-triggering. The Minus Overlay Text Triggering False Positive Ad Detection fix in this same Known Issues list is the cautionary tale — the notification overlay (/overlay, distinct from /blocking) DOES composite before the snapshot and required keyword exclusions to suppress recursion.
ustreamer C-side change: added a third text region. Files in ustreamer-garagehq:
src/libs/blocking.h—text_ocrfield onus_blocking_config_s,US_BLOCKING_TEXT_OCR_SIZE = 256, declaration ofus_blocking_set_text_ocr()src/libs/blocking.c— setter, clear/snapshot/composite all extended; render block draws attext_x = dst_width - text_w - 30, text_y = 30using the same IBM Plex Mono Regular face astext_stats. Reuses the existing_ft_mutexsince FreeType is not thread-safe across the 4 MPP workers.src/ustreamer/http/server.c—text_ocrURL param parsing in_http_callback_blocking_set.text_stats_scaleis reused for the OCR text size (no separate scale param needed).
Python wiring:
src/ad_blocker.py—_ocr_trigger_textinstance,_format_ocr_trigger(raw, source)builds the(Ad) 0:30 leftsnippet (paren-wraps the matched keyword inside the OCR text snippet, ASCII-collapsed, ≤50 chars),_render_ocr_text()returns empty when debug is off so the C side draws nothing.show(source, ocr_trigger_text="")accepts the trigger payload from minus; only overwrites the stored snippet when given a non-empty value (or when transitioning to vlm-only) so the top-right does not flicker as OCR text comes and goes during a block.set_debug_overlay_enabled()re-renderstext_vocab(to add/strip the header) and pushestext_ocrin the right direction without waiting for the next rotation.minus.py— stasheslast_matched_keywordsin the OCR loop, helper_first_match_for_overlay()returns(keyword, snippet_text)for the most recent match._load_system_settingsaddsdebug_overlay: Truedefault +set_debug_overlay_enabled(enabled)persists and propagates. Cleared in the block-end branch so the next block starts fresh.src/webui.py—/api/debug-overlay/{enable,disable}route throughminus.set_debug_overlay_enabled()for persistence. ThePOST /api/test/trigger-blockendpoint injects a synthetic("Ad", "Ad 0:30 left")snippet whensourceisocr/bothso the top-right slot can be exercised without real ads.src/templates/index.html— toggle relabeled "Debug Dashboard" → "Debug" with a tooltip listing what it controls.
Files modified:
ustreamer-garagehq/src/libs/blocking.{h,c},src/ustreamer/http/server.c— newtext_ocrAPI + top-right renderminus.py,src/ad_blocker.py,src/webui.py,src/templates/index.html