Add embedded MCP server for in-process UI introspection#10985
Add embedded MCP server for in-process UI introspection#10985tilladam wants to merge 4 commits intoslint-ui:masterfrom
Conversation
When the `mcp` feature is enabled and `SLINT_MCP_PORT` is set at runtime, each Slint application starts an HTTP server implementing the MCP Streamable HTTP transport. This allows MCP clients (e.g. coding assistants) to inspect and interact with the running UI directly, without a separate bridge process. Key components: - introspection.rs: Shared state (window/element arenas, property extraction, query execution) reusable by both systest and MCP transports - mcp_server.rs: HTTP server (httparse + async-net), JSON-RPC 2.0 handler, and 11 MCP tools (list_windows, get_element_tree, take_screenshot, etc.) Feature flag wiring: slint/mcp -> selector/mcp -> testing/mcp
… and cleanup - Element handle arena eviction with FIFO policy (cap 10k, preserves root handles) - Replace macro DSL with serde Deserialize structs for tool parameter extraction - 61 tests covering protocol, deserialization, tool error paths, schema validation - HTTP/1.1 keep-alive with carry-over buffer for pipelined requests - CORS origin reflection (echo validated origin instead of hardcoding http://localhost) - Replace _mcp_image sentinel with typed ToolResult enum - Flatten handle_connection nesting, reuse shared index conversion in systest
|
I didn't think someone would do it 😅 |
|
I've been thinking about this back and forth and my current thinking is this: I think the "inspector" side of this should live entirely inside Slint. I'm not sure this has to be MCP as protocol, but fundamentally I think we want the ability to set certain feature flags / etc. to activate an API that permits inspecting the element tree, take screenshots, etc. . We've got this mode of the testing backend connecting to a given address if an environment variable is set, but I think for this agent interaction, we want the agent to specify a port and expect the application to listen at that given port. If the port can't be listened on, the process should abort in a way that tells the agent that it should try again with a different port. There's still a race though if two agents try to launch an app with the same port, one might end up talking to the wrong one. So a cookie might be worth adding to the protocol here - just to catch the error. I think the agent side should be the result of the llm learning about how to connect to the inspector via |
While the internals of Slint itself are foreign to me; the (general purpose) engineer says in me that you should just write the underlying code in a generic way. i.e. Have one part of the code implementing all of the functionality needed for debugging; and then just expose that using simple wrappers. Be it MCP; HTTP; Pipes, local sockets, or whatever IPC protocol you wish to use. Under normal conventional means, you'd just have 1 protocol you expose, and other code can then link up and connect to that protocol. With MCP, I think it may be a bit trickier; because you want ideally a hassle free setup experience; so embedded is good; you could in theory just point your MCP-enabled client at a localhost URL. Asking people to install docker for instance, or other software; which you would otherwise get, may be a bit more burdensome. The difficult part will always be multiple instances at a time though. |
|
(Looking at the PR now; somehow thought this was the previous one :) |
|
From what I can tell this is awesome. Pretty much what I'm looking for. This way we can teach Claude etc quite easily how to introspect and interact. I have just one concern, but it's a big one: The implementation is huge and duplicates a lot of functionality. Adding more properties, etc sounds like one requires touching a lot of code. I'd like to explore if it's possible to move to a different protobuf implementation and then, instead of mcp, expose protobuf messages directly as json, along with a generated schema. One issue here is that protobuf beyond the crate we're currently using requires a protoc binary, which sucks. But there is now https://crates.io/crates/protoc-bin-vendored , which might just be good enough to generate the code in build.rs. I'd be happy to discuss this further - plan level. |
Summary
mcpfeature is enabled andSLINT_MCP_PORT=<port>is set at runtime, an HTTP server starts on127.0.0.1:<port>implementing the MCP Streamable HTTP transport with 11 tools:list_windows,get_window_properties,find_elements_by_id,get_element_properties,query_element_descendants,get_element_tree,take_screenshot,click_element,invoke_accessibility_action,set_element_value,dispatch_key_eventsystestmodule to share anIntrospectionStatewith the new MCP transport, avoiding code duplication for window tracking, element handle management, property extraction, and query executionArchitecture
Key design decisions
mcpfeature is fully opt-in, zero cost when disabled127.0.0.1, validates Origin header against localhost addresses for DNS rebinding protectionConnection: closerespectedDeserializestructs for all tool parameters with proper error messagesTest plan
cargo test -p i-slint-backend-testing --features mcp --lib— 63 tests passcargo check -p i-slint-backend-testing --features system-testing— systest refactor compiles--features mcpenabled