Skip to content

revan-zhang/PhonePilot

Repository files navigation

PhonePilot

Enable AI Agents to Physically Control Your Phone

Platform Electron React TypeScript MCP License


About

PhonePilot is an innovative desktop application that enables AI agents to physically control smartphones through a mechanical arm. Using the Model Context Protocol (MCP), AI agents (such as Claude, Cursor, etc.) can directly operate the mechanical arm to perform taps, swipes, and other touch interactions on the phone screen, while observing the results in real-time through a camera feed.

This opens up a new dimension of physical interaction for AI "Computer Use" capabilities — allowing AI to not only control computers but also operate real mobile devices.

Mechanical Arm Hardware
Mechanical Arm Hardware Setup

Features

🤖 Native MCP Protocol Support

Built-in MCP Server supporting both Streamable HTTP and SSE transport protocols, seamlessly integrating with any MCP-compatible AI client.

Tool Description
arm-connect Connect to the mechanical arm controller
arm-disconnect Disconnect from the mechanical arm
arm-move Move the arm to a specified position
arm-click Perform a click at the current position
capture-frame Capture the current camera frame

📷 Real-time Visual Feedback

HD camera with live phone screen preview, featuring:

  • Auto-detection and connection to DECXIN cameras
  • Manual focus mode to prevent autofocus hunting
  • Crosshair and grid overlay assistants
  • 90° auto-rotation to match phone portrait display

Control Software Interface
PhonePilot Control Interface

🎮 Precision Mechanical Control

  • Millimeter-accurate X/Y axis movement
  • Adjustable step size (1-50mm)
  • Adjustable touch depth (Z-axis)
  • Real-time operation logging

🖥️ Cross-Platform Desktop App

Built with Electron, natively supporting:

  • macOS (Intel & Apple Silicon)
  • Windows (x64)
  • Linux (AppImage, deb)

Demo

Point Calibration
📹 Click to watch the full demo video

How It Works

┌─────────────────┐     MCP Protocol      ┌──────────────────┐
│                 │◄────────────────────►│                  │
│   AI Agent      │   arm-move, click    │   PhonePilot     │
│  (Claude, etc)  │   capture-frame      │   Desktop App    │
│                 │                       │                  │
└─────────────────┘                       └────────┬─────────┘
                                                   │
                                          ┌────────┴─────────┐
                                          │                  │
                                    ┌─────▼─────┐     ┌──────▼─────┐
                                    │ Mechanical│     │   Camera   │
                                    │    Arm    │     │  (DECXIN)  │
                                    └─────┬─────┘     └──────┬─────┘
                                          │                  │
                                          ▼                  ▼
                                    ┌──────────────────────────┐
                                    │      Smartphone          │
                                    │    (Physical Device)     │
                                    └──────────────────────────┘
  1. AI Agent connects to PhonePilot via MCP protocol
  2. PhonePilot translates MCP commands into mechanical arm control instructions
  3. Mechanical Arm performs physical touch operations on the phone screen
  4. Camera captures the screen and returns the frame to the AI agent
  5. AI Agent analyzes the frame and decides on the next action

Getting Started

Prerequisites

  • Node.js 20.x or later
  • Yarn package manager
  • Compatible mechanical arm controller (via COM port)
  • USB camera (DECXIN recommended)

Installation & Running

# Clone the repository
git clone https://github.com/your-username/PhonePilot.git
cd PhonePilot

# Install dependencies
yarn install

# Start development environment
yarn electron:dev

Building for Production

# Build for current platform
yarn electron:build

# Build for specific platforms
yarn build:mac     # macOS
yarn build:win     # Windows
yarn build:linux   # Linux

MCP Integration

PhonePilot provides a complete MCP Server implementation that integrates with any MCP-compatible AI client.

Endpoints

Endpoint Protocol Purpose
POST /mcp Streamable HTTP Modern MCP clients
GET /sse SSE Legacy MCP clients
GET /health HTTP Health check

Configuration Example

Configure the MCP Server in your AI client:

{
  "mcpServers": {
    "phonepilot": {
      "url": "http://localhost:3847/sse"
    }
  }
}

License

This project is licensed under the MIT License.


Made with ❤️ for the AI-powered future

About

MCP-powered AI agents that physically control smartphones through a robotic arm.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors