This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
WebDriverIO MCP Server is a Model Context Protocol (MCP) server that enables Claude Desktop to interact with web browsers and mobile applications using WebDriverIO for automation. The server supports:
- Browser automation: Chrome browser control (headed/headless)
- Mobile app automation: iOS and Android native app testing via Appium
- Cross-platform: Unified API for web, iOS, and Android automation
The server is published as an npm package (webdriverio-mcp) and runs via stdio transport.
npm run bundle # Clean, build with tsup, make executable, and create .tgz package
npm run prebundle # Clean lib directory and .tgz files
npm run postbundle # Create npm package tarballnpm run dev # Run development server with tsx (no build)
npm start # Run built server from lib/server.jsServer Entry Point (src/server.ts)
- Initializes MCP server using
@modelcontextprotocol/sdk - Redirects console output to stderr to avoid interfering with MCP protocol (Chrome writes to stdout)
- Registers all tool handlers with the MCP server
- Uses StdioServerTransport for communication with Claude Desktop
Session State Management (src/tools/browser.tool.ts and src/tools/app-session.tool.ts)
- Maintains global state with three Maps:
browsers: Map<sessionId, WebdriverIO.Browser> - stores all browser/app instancescurrentSession: string | null - tracks the single active sessionsessionMetadata: Map<sessionId, {type, capabilities}> - tracks session type and config
getBrowser()helper retrieves the current active browser/app instancestartBrowserToolcreates Chrome browser session with configurable options:- Headless mode support
- Custom window dimensions (400-3840 width, 400-2160 height)
- Chrome-specific arguments (sandbox, security, media stream, etc.)
startAppToolcreates iOS/Android app session via Appium with platform-specific capabilities:noReset: Controls whether to preserve app state between sessions (default: false)fullReset: Controls whether to uninstall app before/after session (default: true)- Sessions created with
noReset: truewill automatically detach on close (preserves session state)
closeSessionToolproperly cleans up browser/app sessions and metadata:detach: false(default): CallsdeleteSession()to terminate on serverdetach: true: Disconnects without terminating (preserves session for manual testing)- Sessions created without
appPathor withnoReset: trueautomatically detach
Tool Pattern All tools follow a consistent pattern:
- Export Zod schema for arguments validation (e.g.,
navigateToolArguments) - Export ToolCallback function (e.g.,
navigateTool) - Use
getBrowser()to access current session - Return
CallToolResultwith text content - Wrap operations in try-catch and return errors as text content
Browser Script Execution (src/scripts/get-interactable-elements.ts)
- Returns a function that executes in the browser context (not Node.js)
getInteractableElements()finds all visible, interactable elements on the page- Uses modern
element.checkVisibility()API with fallback for older browsers - Generates CSS selectors using IDs, classes, or nth-child path-based selectors
- Returns element metadata: tagName, type, id, className, textContent, value, placeholder, href, ariaLabel, role, cssSelector, isInViewport
Mobile Element Detection (src/locators/ and src/utils/mobile-elements.ts)
- Uses XML-based page source parsing to extract all element attributes
- Platform-specific element classification:
ANDROID_INTERACTABLE_TAGS: Button, EditText, CheckBox, RadioButton, Switch, Spinner, etc.ANDROID_LAYOUT_CONTAINERS: ViewGroup, LinearLayout, RelativeLayout, FrameLayout, ScrollView, etc.IOS_INTERACTABLE_TAGS: Button, TextField, SecureTextField, Switch, Picker, etc.IOS_LAYOUT_CONTAINERS: View, ScrollView, StackView, CollectionView, etc.
- Generates multiple locator strategies per element:
- Accessibility ID (cross-platform)
- Resource ID / Name
- Text / Label matching
- XPath (full and simplified)
- UiAutomator (Android) / Predicates (iOS)
- Smart filtering with
inViewportOnlyandincludeContainersparameters
TypeScript (tsconfig.json)
- Target: ES2022, Module: ESNext
- Source:
src/, Output:build/(but not used for distribution) - Strict mode disabled
- Includes types for Node.js and
@wdio/types
Bundler (tsup.config.ts)
- Entry:
src/server.ts - Output:
lib/directory (ESM format only) - Generates declaration files and sourcemaps
- Externalizes
zoddependency - The shebang
#!/usr/bin/env nodein server.ts is preserved for CLI execution
Web Browsers:
- CSS selectors:
button.my-class,#element-id - XPath:
//button[@class='my-class'] - Text matching:
button=Exact text(exact match),a*=Link containing(partial match)
Mobile Apps:
- Accessibility ID:
~loginButton(works on both iOS and Android) - Android UiAutomator:
android=new UiSelector().text("Login") - iOS Class Chain:
-ios class chain:**/XCUIElementTypeButton[\label == "Login"`]` - iOS Predicate String:
-ios predicate string:label == "Login" AND visible == 1 - XPath:
//android.widget.Button[@text="Login"]or//XCUIElementTypeButton[@label="Login"]
See src/utils/mobile-selectors.ts for helper functions to build mobile selectors programmatically.
Appium Server Setup: The server requires an Appium server running to connect to iOS/Android devices and emulators.
-
Install Appium:
npm install -g appium
-
Install Platform Drivers:
# For iOS appium driver install xcuitest # For Android appium driver install uiautomator2
-
Start Appium Server:
appium # Default: http://127.0.0.1:4723
Device/Emulator Requirements:
- iOS: Xcode installed, iOS Simulator running, or physical device connected
- Android: Android Studio installed, emulator running, or physical device connected
The server can be configured via environment variables or per-session parameters:
Environment Variables:
APPIUM_URL: Appium server hostname (default:127.0.0.1)APPIUM_URL_PORT: Appium server port (default:4723)APPIUM_PATH: Appium server path (default:/)
Example .env:
APPIUM_URL=127.0.0.1
APPIUM_URL_PORT=4723
APPIUM_PATH=/Use the start_app_session tool with platform-specific parameters:
iOS Example (Simulator):
const parameters = {
platform: 'iOS',
appPath: '/path/to/MyApp.app',
deviceName: 'iPhone 15 Pro',
platformVersion: '17.0',
automationName: 'XCUITest', // Optional, defaults to XCUITest
autoGrantPermissions: true, // Optional, defaults to true (grants app permissions)
autoAcceptAlerts: true, // Optional, defaults to true (auto-accepts system alerts)
autoDismissAlerts: false, // Optional, set to true to dismiss instead of accept
noReset: false, // Optional, defaults to false (preserves app state if true)
fullReset: true, // Optional, defaults to true (uninstalls app if true)
}iOS Example (Real Device):
const parameters = {
platform: 'iOS',
appPath: '/path/to/MyApp.ipa',
deviceName: 'My iPhone', // Physical device name
platformVersion: '17.0',
udid: '00008030-001234567890ABCD', // Required for physical devices (40-character hex string)
automationName: 'XCUITest', // Optional, defaults to XCUITest
autoGrantPermissions: true, // Optional, defaults to true (grants app permissions)
autoAcceptAlerts: true, // Optional, defaults to true (auto-accepts system alerts)
autoDismissAlerts: false, // Optional, set to true to dismiss instead of accept
}Finding Your iOS Device UDID: The UDID (Unique Device Identifier) is a 40-character alphanumeric string required when testing on physical iOS devices.
Methods to find your device's UDID:
-
Xcode (Devices and Simulators):
- Connect your iOS device via USB
- Open Xcode → Window → Devices and Simulators
- Select your device in the left sidebar
- The UDID is shown as "Identifier" (e.g.,
00008030-001234567890ABCD)
-
Terminal (using xcrun):
xcrun xctrace list devices
Output shows connected devices with their UDIDs:
My iPhone (17.0) (00008030-001234567890ABCD) -
Finder (macOS Catalina and later):
- Connect your device via USB
- Open Finder and select your device in the sidebar
- Click on the device info below the device name to cycle through information
- The UDID will be displayed
Android Example:
const parameters = {
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'Pixel_6_API_34',
platformVersion: '14',
automationName: 'UiAutomator2', // Optional, defaults to UiAutomator2
autoGrantPermissions: true, // Optional, defaults to true (grants app permissions automatically)
autoAcceptAlerts: true, // Optional, defaults to true (auto-accepts system alerts)
autoDismissAlerts: false, // Optional, set to true to dismiss instead of accept
appWaitActivity: 'com.example.MainActivity', // Optional, specific activity to wait for
noReset: false, // Optional, defaults to false (preserves app state if true)
fullReset: true, // Optional, defaults to true (uninstalls app if true)
}Override Appium Server:
const parameters = {
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'emulator-5554',
appiumHost: 'localhost', // Override APPIUM_URL
appiumPort: 4724, // Override APPIUM_URL_PORT
appiumPath: '/wd/hub', // Override APPIUM_PATH
}App State Reset Behavior:
Control how app state is handled during session creation using noReset and fullReset parameters:
| noReset | fullReset | Behavior |
|---|---|---|
false (default) |
true (default) |
Full reset: Uninstall and reinstall app (clean state) |
false |
false |
Clear app data but keep app installed |
true |
false |
Preserve state: App stays installed, data preserved |
Examples:
// Default: Clean install (uninstall/reinstall)
start_app_session({ platform: 'Android', appPath: '/path/to/app.apk', deviceName: 'emulator-5554' })
// Continue from current state (preserve app data)
start_app_session({
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'emulator-5554',
noReset: true,
fullReset: false
})
// Clear app data but don't uninstall
start_app_session({
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'emulator-5554',
noReset: false,
fullReset: false
})The server maintains a single-session model: only one browser or app session is active at a time.
Session Creation:
start_browser: Start a new Chrome browser sessionstart_app_session: Start a new iOS or Android app session with full control over app state (noReset/fullReset)- Sessions created with
noReset: truewill automatically detach on close (preserves session state) - Sessions created without
appPathwill automatically detach on close
- Sessions created with
Session Closure:
close_session: Close or detach from the current sessiondetach: false(default): Terminate session on Appium serverdetach: true: Disconnect without terminating (preserves session for manual testing)- Automatically detaches sessions created with
noReset: trueor withoutappPath
Session Switching: To switch from browser to app (or vice versa), close the current session first, then start a new one.
Element Detection:
get_visible_elements: Get visible, interactable elements on the page- Parameters:
inViewportOnly(boolean, default:true): Only return elements within the visible viewport- Set to
falseto get ALL elements on the page, including off-screen elements - Useful for finding elements that need scrolling to reach
- Set to
includeContainers(boolean, default:false): Include layout containers in results- Mobile only: ViewGroup, FrameLayout, ScrollView (Android) or View, StackView (iOS)
- Set to
trueto see full layout hierarchy for debugging complex UIs - Web: Not applicable, web elements are not classified as containers
- Example usage:
// Get only viewport-visible interactive elements (default) get_visible_elements() // Get all elements including off-screen (useful for scroll testing) get_visible_elements({ inViewportOnly: false }) // Get all elements including layout containers (mobile debugging) get_visible_elements({ includeContainers: true }) // Get ALL elements including containers and off-screen get_visible_elements({ inViewportOnly: false, includeContainers: true })
- Parameters:
Touch Gestures:
tap_element: Tap element by selector or coordinatesswipe: Swipe in a direction (up/down/left/right) with configurable durationlong_press: Long press element or coordinatesdrag_and_drop: Drag from one location to another
App Lifecycle:
get_app_state: Check app state (not installed, not running, background, foreground)activate_app: Bring app to foregroundterminate_app: Kill running app
Context Switching (Hybrid Apps):
get_contexts: List available contexts (NATIVE_APP, WEBVIEW_*)get_current_context: Show active contextswitch_context: Switch between native and webview contexts
Device Interaction:
get_device_info: Get platform, version, screen sizerotate_device: Set orientation (PORTRAIT/LANDSCAPE)get_orientation: Get current orientationlock_device/unlock_device: Control screen lockis_device_locked: Check lock statusshake_device: Simulate shake gesture (iOS only)send_keys: Send keyboard input (Android only)press_key_code: Press Android key code (e.g., BACK=4, HOME=3)hide_keyboard/is_keyboard_shown: Keyboard controlopen_notifications: Open notification panel (Android only)get_geolocation/set_geolocation: GPS control
The src/utils/mobile-selectors.ts module provides helper functions for building mobile selectors:
Accessibility ID (Cross-platform):
import { accessibilityId } from '../utils/mobile-selectors';
const selector = accessibilityId('loginButton'); // '~loginButton'Android UiAutomator:
import { androidSelectors } from '../utils/mobile-selectors';
androidSelectors.text('Login') // Text match
androidSelectors.textContains('Log') // Partial text
androidSelectors.resourceId('com.app:id/button') // Resource ID
androidSelectors.className('android.widget.Button')
androidSelectors.description('Login button')iOS Predicates and Class Chains:
import { iOSSelectors } from '../utils/mobile-selectors';
iOSSelectors.label('Login') // Label match
iOSSelectors.labelContains('Log') // Partial label
iOSSelectors.name('loginButton') // Name attribute
iOSSelectors.visible() // Visible only
iOSSelectors.type('Button') // Element type
iOSSelectors.and( // Combine conditions
iOSSelectors.label('Login'),
iOSSelectors.visible()
)Example 1: Testing Demo Android App (Book Scanning App)
// Real test case: Validate Demo onboarding screen
// APK: C:\Users\demo-liveApiGbRegionNonMinifiedRelease-3018788.apk
// 1. Start Demo app on Android emulator
start_app_session({
platform: 'Android',
appPath: 'C:\\Users\\demo-liveApiGbRegionNonMinifiedRelease-3018788.apk',
deviceName: 'emulator-5554',
autoGrantPermissions: true, // Auto-grant camera/storage permissions for scanning
})
// 2. Get onboarding elements (found 5 elements on "Step 1: Scan" screen)
get_visible_elements()
// Returns:
// - ImageView: "Step One, Scan." (accessibility ID: ~Step One, Scan.)
// - TextView: "Step 1: Scan" (resourceId: uk.co.demo:id/text_description_onboarding)
// - TextView: "Scan your old and unwanted items."
// - TextView: "Skip" button
// - Button: Navigation button (likely "Next")
// 3. Tap Skip to bypass onboarding
tap_element({ selector: 'android=new UiSelector().text("Skip")' })
// 4. Interact with main app...Example 2: Testing World of Books Website (E-commerce)
// Real test case: Validate worldofbooks.com homepage
// 1. Start browser session
start_browser({ headless: false, windowWidth: 1920, windowHeight: 1080 })
// 2. Navigate to World of Books
navigate({ url: 'https://www.worldofbooks.com' })
// 3. Get visible elements (found 32 elements including navigation, search, products)
get_visible_elements()
// Returns:
// - Navigation: Cyber Monday, Christmas, Fiction Books, Children's Books, etc.
// - User account links: Help, Account, Wishlist, Basket
// - Search input with Algolia autocomplete
// - Product wishlist buttons (6 products visible)
// - Cookie consent banner (3 buttons: Settings, Reject All, Accept All)
// 4. Accept cookies
click_element({ selector: '#onetrust-accept-btn-handler' })
// 5. Search for a book
set_value({ selector: '#autocomplete-0-input', value: 'Harry Potter' })
click_element({ selector: '#searchButton' })Workflow 1: Preserve App State Between Sessions
// Scenario: App already installed and logged in, want to test from current state
// 1. Start session without resetting app state
start_app_session({
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'emulator-5554',
noReset: true, // Preserve app data
fullReset: false, // Don't uninstall
})
// 2. App continues from current state (user logged in, settings preserved)
get_visible_elements()
// 3. Test feature without re-login
tap_element({ selector: 'android=new UiSelector().text("Dashboard")' })
// 4. Close session normally (app stays installed)
close_session()Workflow 2: Clean App Install for Fresh Test
// Scenario: Need fresh app state for regression testing
// 1. Start session with full reset (default behavior)
start_app_session({
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'emulator-5554',
// noReset defaults to false, fullReset defaults to true
})
// 2. App is freshly installed (no previous data)
get_visible_elements()
// 3. Test onboarding flow from scratch
tap_element({ selector: 'android=new UiSelector().text("Get Started")' })
// 4. Close session (app uninstalled automatically)
close_session()Testing an iOS App (Simulator):
// 1. Start app session on simulator
start_app_session({
platform: 'iOS',
appPath: '/path/to/MyApp.app',
deviceName: 'iPhone 15 Pro',
})
// 2. Interact with elements
tap_element({ selector: '~loginButton' })
set_value({ selector: '~usernameField', value: 'testuser' })
tap_element({ selector: '-ios predicate string:label == "Submit"' })
// 3. Verify state
get_app_state({ bundleId: 'com.example.myapp' })
// 4. Take screenshot
take_screenshot({ filename: 'login-screen.png' })
// 5. Close session
close_session()Testing an iOS App (Real Device):
// 1. Start app session on physical device
start_app_session({
platform: 'iOS',
appPath: '/path/to/MyApp.ipa',
deviceName: 'My iPhone',
platformVersion: '17.0',
udid: '00008030-001234567890ABCD', // Device UDID required
})
// 2. Interact with elements
tap_element({ selector: '~loginButton' })
set_value({ selector: '~usernameField', value: 'testuser' })
tap_element({ selector: '-ios predicate string:label == "Submit"' })
// 3. Test device-specific features
get_device_info() // Returns physical device info
set_geolocation({ latitude: 37.7749, longitude: -122.4194 })
// 4. Take screenshot
take_screenshot({ filename: 'real-device-test.png' })
// 5. Close session
close_session()Testing an Android App with Webview:
// 1. Start app
start_app_session({
platform: 'Android',
appPath: '/path/to/app.apk',
deviceName: 'emulator-5554',
autoGrantPermissions: true,
})
// 2. Native app interaction
tap_element({ selector: 'android=new UiSelector().text("Open Web")' })
// 3. Switch to webview context
get_contexts() // Lists: NATIVE_APP, WEBVIEW_com.example.app
switch_context({ context: 'WEBVIEW_com.example.app' })
// 4. Web interaction (use CSS selectors)
click_element({ selector: '#loginButton' })
set_value({ selector: '#username', value: 'testuser' })
// 5. Switch back to native
switch_context({ context: 'NATIVE_APP' })
// 6. Close
close_session()Device Manipulation:
// Rotate device
rotate_device({ orientation: 'LANDSCAPE' })
// Swipe gesture
swipe({ direction: 'up', duration: 500 })
// Set location
set_geolocation({ latitude: 37.7749, longitude: -122.4194 })
// Background app
background_app({ seconds: 5 }) // Background for 5 seconds, then resume-
Console Output Redirection: All console methods (log, info, warn, debug) are redirected to stderr because Chrome writes to stdout, which would corrupt the MCP stdio protocol.
-
Element Visibility: The
get-interactable-elements.tsscript runs in the browser and must be completely self-contained (no external dependencies). It filters for visible, non-disabled elements and returns all of them regardless of viewport status. -
Mobile Element Detection & Locator Generation (New Architecture - Inspired by
appium-mcp):- XML Parsing: Uses
browser.getPageSource()to retrieve native XML hierarchy, then parses with platform-specific parsers - Element Classification: Filters elements based on platform-specific tag sets:
- Interactable elements: Buttons, inputs, checkboxes, switches, pickers, etc.
- Layout containers: ViewGroups, ScrollViews, StackViews, CollectionViews, etc.
- Multi-Strategy Locator Generation: For each element, generates multiple selector options:
- Primary: Accessibility ID / Resource ID (most stable)
- Secondary: Text/Label matching (language-dependent)
- Fallback: XPath with attributes (most specific but brittle)
- Platform-specific: UiAutomator (Android) or Predicates (iOS)
- Smart Filtering:
inViewportOnly: Filters elements by screen bounds to show only visible itemsincludeContainers: Controls whether layout wrappers are included in resultshasMeaningfulContent: Checks if element has text, description, or interactive children
- Files:
src/locators/element-filter.ts,src/locators/generate-all-locators.ts,src/locators/source-parsing.ts
- XML Parsing: Uses
-
Scroll Behavior: Click operations default to scrolling elements into view (
scrollIntoViewwith center alignment) before clicking. -
Session Management: The server maintains a Map of browser/app sessions keyed by sessionId, with a
sessionMetadataMap tracking session type ('browser', 'ios', 'android') and capabilities. Only onecurrentSessionis active at a time. All tools operate on the current session. Sessions can be created withstart_browserorstart_app_session. When closing,detach: truepreserves the session on the Appium server for continued manual testing. Sessions created withnoReset: trueor withoutappPathautomatically detach on close. -
Mobile State Sharing: The
browser.tool.tsexports state via(getBrowser as any).__stateto allowapp-session.tool.tsto access and modify the shared session state. This maintains single-session behavior across browser and mobile automation. -
Automatic Permission & Alert Handling: Appium capabilities now include
autoGrantPermissions,autoAcceptAlerts, andautoDismissAlertsby default, eliminating manual handling of permission popups. These settings are applied during session initialization insrc/config/appium.config.ts. -
Error Handling: Tools catch errors and return them as text content rather than throwing, ensuring the MCP protocol remains stable.
-
Cross-Platform Compatibility: Many existing tools (click_element, set_value, find_element, take_screenshot, etc.) work seamlessly on both web browsers and mobile apps. Mobile-specific tools (gestures, app lifecycle, device interaction) only work with app sessions.
To add a new tool:
- Create a new file in
src/tools/(e.g.,my-tool.tool.ts) - Define Zod schema for arguments:
export const myToolArguments = { ... } - Implement the tool callback:
export const myTool: ToolCallback = async ({ args }) => { ... } - Import and register in
src/server.ts:server.tool('my_tool', 'description', myToolArguments, myTool)
Example:
import { getBrowser } from './browser.tool';
import { z } from 'zod';
import { ToolCallback } from '@modelcontextprotocol/sdk/server/mcp';
export const myToolArguments = {
param: z.string().describe('Description of parameter'),
};
export const myTool: ToolCallback = async ({ param }: { param: string }) => {
try {
const browser = getBrowser();
// ... implementation
return {
content: [{ type: 'text', text: `Success: ${result}` }],
};
} catch (e) {
return {
content: [{ type: 'text', text: `Error: ${e}` }],
};
}
};