Enable AI Agents to Physically Control Your Phone
PhonePilot is an innovative desktop application that enables AI agents to physically control smartphones through a mechanical arm. Using the Model Context Protocol (MCP), AI agents (such as Claude, Cursor, etc.) can directly operate the mechanical arm to perform taps, swipes, and other touch interactions on the phone screen, while observing the results in real-time through a camera feed.
This opens up a new dimension of physical interaction for AI "Computer Use" capabilities — allowing AI to not only control computers but also operate real mobile devices.
Built-in MCP Server supporting both Streamable HTTP and SSE transport protocols, seamlessly integrating with any MCP-compatible AI client.
| Tool | Description |
|---|---|
arm-connect |
Connect to the mechanical arm controller |
arm-disconnect |
Disconnect from the mechanical arm |
arm-move |
Move the arm to a specified position |
arm-click |
Perform a click at the current position |
capture-frame |
Capture the current camera frame |
HD camera with live phone screen preview, featuring:
- Auto-detection and connection to DECXIN cameras
- Manual focus mode to prevent autofocus hunting
- Crosshair and grid overlay assistants
- 90° auto-rotation to match phone portrait display
- Millimeter-accurate X/Y axis movement
- Adjustable step size (1-50mm)
- Adjustable touch depth (Z-axis)
- Real-time operation logging
Built with Electron, natively supporting:
- macOS (Intel & Apple Silicon)
- Windows (x64)
- Linux (AppImage, deb)
📹 Click to watch the full demo video
┌─────────────────┐ MCP Protocol ┌──────────────────┐
│ │◄────────────────────►│ │
│ AI Agent │ arm-move, click │ PhonePilot │
│ (Claude, etc) │ capture-frame │ Desktop App │
│ │ │ │
└─────────────────┘ └────────┬─────────┘
│
┌────────┴─────────┐
│ │
┌─────▼─────┐ ┌──────▼─────┐
│ Mechanical│ │ Camera │
│ Arm │ │ (DECXIN) │
└─────┬─────┘ └──────┬─────┘
│ │
▼ ▼
┌──────────────────────────┐
│ Smartphone │
│ (Physical Device) │
└──────────────────────────┘
- AI Agent connects to PhonePilot via MCP protocol
- PhonePilot translates MCP commands into mechanical arm control instructions
- Mechanical Arm performs physical touch operations on the phone screen
- Camera captures the screen and returns the frame to the AI agent
- AI Agent analyzes the frame and decides on the next action
- Node.js 20.x or later
- Yarn package manager
- Compatible mechanical arm controller (via COM port)
- USB camera (DECXIN recommended)
# Clone the repository
git clone https://github.com/your-username/PhonePilot.git
cd PhonePilot
# Install dependencies
yarn install
# Start development environment
yarn electron:dev# Build for current platform
yarn electron:build
# Build for specific platforms
yarn build:mac # macOS
yarn build:win # Windows
yarn build:linux # LinuxPhonePilot provides a complete MCP Server implementation that integrates with any MCP-compatible AI client.
| Endpoint | Protocol | Purpose |
|---|---|---|
POST /mcp |
Streamable HTTP | Modern MCP clients |
GET /sse |
SSE | Legacy MCP clients |
GET /health |
HTTP | Health check |
Configure the MCP Server in your AI client:
{
"mcpServers": {
"phonepilot": {
"url": "http://localhost:3847/sse"
}
}
}This project is licensed under the MIT License.
Made with ❤️ for the AI-powered future

