Overview
Every computer instance provides core actions for browser and desktop automation. These actions work identically across both browser and desktop instances (exceptnavigate, which is browser-only).
Navigation
navigate(url)
Navigate to a URL. Browser instances only.url(string) - The URL to navigate to
- Opening web pages
- Navigating between pages
- Loading web applications
Mouse Actions
click(x, y)
Click at specific screen coordinates.x(int) - X coordinate in pixelsy(int) - Y coordinate in pixels
- Clicking buttons
- Selecting menu items
- Activating UI elements
double_click(x, y) / doubleClick(x, y)
Double-click at specific coordinates.x(int) - X coordinatey(int) - Y coordinate
- Opening files
- Selecting text
- Desktop application interactions
right_click(x, y) / rightClick(x, y)
Right-click at specific coordinates to open context menus.x(int) - X coordinatey(int) - Y coordinate
- Opening context menus
- Right-click actions
- Desktop interactions
drag(from_x, from_y, to_x, to_y)
Drag from one position to another.from_x(int) - Starting X coordinatefrom_y(int) - Starting Y coordinateto_x(int) - Ending X coordinateto_y(int) - Ending Y coordinate
- Drag and drop
- Moving windows
- Selecting regions
mouse_down(x, y) / mouseDown(x, y)
Press and hold the left mouse button at specific coordinates. Use withmouse_up for fine-grained drag control.
x(float) - X coordinate in pixelsy(float) - Y coordinate in pixels
- Fine-grained drag control
- Custom drag interactions
- Drawing applications
mouse_up(x, y) / mouseUp(x, y)
Release the left mouse button at specific coordinates.x(float) - X coordinate in pixelsy(float) - Y coordinate in pixels
- Complete a fine-grained drag operation
- Release after mouse_down
Keyboard Actions
type(text)
Type text at the current cursor position.text(string) - The text to type
- Filling form fields
- Entering search queries
- Text input
hotkey(*keys)
Send keyboard shortcut combinations.*keys(strings) - Key names to press simultaneously
ctrl/cmd + c- Copyctrl/cmd + v- Pastectrl/cmd + s- Savectrl/cmd + f- Findalt + tab- Switch windowsenter- Submit/Enterescape- Cancel/Escape
- Keyboard shortcuts
- System commands
- Application controls
key_down(key) / keyDown(key)
Press and hold a keyboard key. Use withkey_up for complex interactions like shift-click selection.
key(string) - Key name to press (e.g., “Shift”, “Control”, “Alt”)
- Shift-click selection
- Ctrl-click multi-select
- Holding modifier keys during mouse operations
key_up(key) / keyUp(key)
Release a keyboard key that was previously pressed withkey_down.
key(string) - Key name to release
Scrolling
scroll(dx, dy, x?, y?)
Scroll the viewport by delta x and delta y, optionally at a specific position.dx(float) - Horizontal scroll delta (positive = right, negative = left)dy(float) - Vertical scroll delta (positive = down, negative = up)x(float, optional) - X coordinate where the scroll action originatesy(float, optional) - Y coordinate where the scroll action originates
- Scrolling web pages
- Loading lazy-loaded content
- Viewing off-screen content
- Horizontal scrolling in wide layouts
- Scrolling within specific elements (use
x,yto target)
Capture
screenshot(base64?)
Capture a screenshot of the current screen.base64(bool, optional) - Iftrue, returns base64-encoded JPEG data instead of a URL. Default:false
ActionResultwithscreenshot_urlin result dict- When
base64=false(default): Contains an HTTPS URL to the screenshot image - When
base64=true: Contains raw base64-encoded JPEG data (not a URL, despite the field name)
- When
- Capturing evidence
- Visual verification
- Monitoring
- Documentation
- Embedding images directly (use
base64=true)
html()
Get the HTML content of the current page.auto_detect_encoding(bool, optional) - Automatically detect character encoding
ActionResultwithhtml_contentin result dict
- Web scraping
- Content extraction
- Page analysis
- Data mining
debug(command)
Execute a shell command inside the session.command(string) - Shell command to executetimeout_seconds(int, optional) - Command timeout (default: 120)max_output_length(int, optional) - Max output bytes (default: 65536)
ActionResultwithdebug_responsecontaining command output
- Running scripts inside browser environment
- Debugging
- File operations
- System commands
Shell Execution
The
/exec and /exec/sync endpoints are available for desktop sessions (kind: "desktop"). These endpoints run shell commands in a real Linux environment.exec (streaming NDJSON)
Stream stdout/stderr in real time using the HTTP endpoint. This is ideal for long-running commands or live progress updates. Desktop sessions only. Request body:command(string, required)cwd(string, optional, default:/workspace)env(object, optional)timeout_seconds(int, optional, default: 120)
application/x-ndjson- Each line is a JSON object:
{"type":"stdout","data":"..."}{"type":"stderr","data":"..."}{"type":"exit","code":0}{"type":"error","code":"...","message":"..."}
exec/sync (buffered)
Execute a command and return buffered stdout/stderr in a single response. Desktop sessions only. Response fields:stdout(string)stderr(string)exit_code(int)
set_viewport(width, height)
Change the browser viewport dimensions.width(int) - Viewport width in pixelsheight(int) - Viewport height in pixelsscale_factor(float, optional) - Device pixel ratio / zoom level (default: 1.0)
- Testing responsive designs
- Capturing full-width screenshots
- Simulating different devices
- Mobile viewport testing
change_proxy(proxy_url)
Change the proxy settings for the browser session. Browser instances only.proxy_url(string) - Proxy URL in formathttp://user:pass@host:portorsocks5://user:pass@host:port
- Geo-targeted testing
- Avoiding rate limits
- Web scraping with rotating proxies
- Ad verification across regions
Timing
wait(seconds)
Wait for a specified duration.seconds(float) - Duration to wait
- Waiting for page loads
- Allowing animations to complete
- Rate limiting
- Synchronization delays
Batch Actions
batch(actions)
Execute multiple actions in sequence with a single API call. The batch stops on the first error.actions(array) - Array of action objects to execute sequentially
- Actions execute in order
- Stops immediately on first error
- Returns results for all executed actions
- Reducing latency for multi-step workflows
- Atomic operation sequences
- Efficient automation scripts
Complete Example
Here’s a workflow using multiple actions:Action Reference
Core Actions
| Action | Browser | Desktop | Parameters |
|---|---|---|---|
navigate(url) | ✅ | ❌ | url: string |
click(x, y) | ✅ | ✅ | x: float, y: float |
double_click(x, y) | ✅ | ✅ | x: float, y: float |
right_click(x, y) | ✅ | ✅ | x: float, y: float |
drag(x1, y1, x2, y2) | ✅ | ✅ | x1, y1, x2, y2: float |
type(text) | ✅ | ✅ | text: string |
hotkey(*keys) | ✅ | ✅ | *keys: string |
scroll(dx, dy) | ✅ | ✅ | dx: float, dy: float |
screenshot() | ✅ | ✅ | base64?: bool |
html() | ✅ | ❌ | auto_detect_encoding?: bool |
debug(cmd) | ✅ | ✅ | command: string, timeout?, max_output? |
set_viewport(w, h) | ✅ | ❌ | width: int, height: int, scale_factor?: float |
change_proxy(url) | ✅ | ❌ | proxy_url: string |
wait(seconds) | ✅ | ✅ | seconds: float |
Batch Operations
| Action | Browser | Desktop | Parameters |
|---|---|---|---|
batch(actions) | ✅ | ✅ | actions: array of action objects |
Best Practices
Use wait() generously
Use wait() generously
Always wait after navigation, form submissions, or dynamic content loading
Verify coordinates
Verify coordinates
Test coordinate values with screenshots before automation
Handle errors
Handle errors
Check action results, especially for navigation and screenshots
Screenshot for verification
Screenshot for verification
Capture screenshots to verify successful completion