Skip to content

Enable Chrome DevTools Protocol in devcontainer for semantic agent control #2129

@kantord

Description

@kantord

Summary

Enable the Chrome DevTools Protocol (CDP) on the Electron app inside the devcontainer (and optionally expose it off-container in dev/CI) so that AI agents driving the app can use accessibility-tree / DOM-level control (Playwright CLI, raw CDP, etc.) instead of pixel-based control via xdotool + screenshots.

Context

The devcontainer currently gives an agent a working but coarse control surface:

  • Drive input via DISPLAY=:99 xdotool …
  • Observe state via import -window root /tmp/shot.png + load image into model context

This works, but every observation costs a screenshot in the model's context. For a long agent loop (e.g. the experimental visual bug-fix flow on experiment/bug-fix-visual), token spend on images dominates the cost.

CDP gives an agent the same control surface that Chrome DevTools uses internally:

  • Accessibility tree with element refs ([button ref=e12] \"Submit\")
  • Click/fill/press by ref, not by pixel coordinate
  • Network/console/runtime inspection without screenshots
  • DOM/AX snapshots are plain text — cheap, structured, robust against CSS changes

Electron exposes CDP exactly like Chrome — pass --remote-debugging-port=<N> (or app.commandLine.appendSwitch('remote-debugging-port', N) programmatically), then any CDP client can attach.

Proposed work

  1. Enable CDP on Electron in dev mode.
    In scripts/devcontainer-entrypoint.sh, add --remote-debugging-port=9223 to the pnpm start invocation (or to an Electron-side switch). Pick a port distinct from the noVNC port (currently 6080) and document it.

  2. Forward the port off the container.
    In .devcontainer/devcontainer.json, add 9223 to forwardPorts (and add a runArgs entry mirroring the existing \"-p\", \"\${localEnv:CDP_HOST_PORT}:9223\" if we want host-port control, similar to NOVNC_HOST_PORT). Default to localhost-only — never bind publicly.

  3. Document the workflow in the devcontainer-dev skill (.claude/skills/devcontainer-dev/SKILL.md):

    • How to confirm CDP is up: curl http://localhost:9223/json
    • Recommended client: Playwright CLI (npx playwright open --connect-over-cdp http://localhost:9223) or chromium.connectOverCDP(...) for scripts
    • Shared-control caveat: user clicks and agent clicks can race
    • Tier ladder: try AX/DOM first, fall back to xdotool/screenshots only when semantics aren't enough
  4. Optional: ship a small helper script (scripts/devcontainer-cdp.sh) that wraps the common one-liners (navigate, snapshot, click <ref>, fill <ref> <value>) so agents have a tight, well-bounded interface — analogous to how xdotool is the agent's input verb today.

Why this matters now

The experimental visual bug-fix agent (experiment/bug-fix-visual, PR #2120) is the immediate consumer. That experiment is currently stalled on an unrelated issue (claude-code-action workflow validation), but once it can run, screenshot-driven repro will be its dominant cost. Adding CDP turns that into AX-tree-driven repro for most of the loop, with screenshots reserved for genuine pixel bugs.

References

  • Electron CDP docs: enabling --remote-debugging-port
  • Playwright accessibility-tree snapshot output (text refs like [button ref=e12])
  • Existing devcontainer skill: .claude/skills/devcontainer-dev/SKILL.md (already mentions this as "Future: CDP access")

Out of scope

  • Changing the production agent (_bug-fix-agent.yml) to use CDP — that's a follow-up once CDP is wired.
  • Adding CDP to release builds. Dev/devcontainer only for now.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions