Overview
Every computer instance provides core actions for browser and desktop automation. These actions work identically across both browser and desktop instances (exceptnavigate, which is browser-only).
Navigation
navigate(url)
Navigate to a URL. Browser instances only.url(string) - The URL to navigate to
- Opening web pages
- Navigating between pages
- Loading web applications
Mouse Actions
click(x, y)
Click at specific screen coordinates.x(int) - X coordinate in pixelsy(int) - Y coordinate in pixels
- Clicking buttons
- Selecting menu items
- Activating UI elements
double_click(x, y) / doubleClick(x, y)
Double-click at specific coordinates.x(int) - X coordinatey(int) - Y coordinate
- Opening files
- Selecting text
- Desktop application interactions
right_click(x, y) / rightClick(x, y)
Right-click at specific coordinates to open context menus.x(int) - X coordinatey(int) - Y coordinate
- Opening context menus
- Right-click actions
- Desktop interactions
drag(from_x, from_y, to_x, to_y)
Drag from one position to another.from_x(int) - Starting X coordinatefrom_y(int) - Starting Y coordinateto_x(int) - Ending X coordinateto_y(int) - Ending Y coordinate
- Drag and drop
- Moving windows
- Selecting regions
mouse_down(x, y) / mouseDown(x, y)
Press and hold the left mouse button at specific coordinates. Use withmouse_up for fine-grained drag control.
x(float) - X coordinate in pixelsy(float) - Y coordinate in pixels
- Fine-grained drag control
- Custom drag interactions
- Drawing applications
mouse_up(x, y) / mouseUp(x, y)
Release the left mouse button at specific coordinates.x(float) - X coordinate in pixelsy(float) - Y coordinate in pixels
- Complete a fine-grained drag operation
- Release after mouse_down
Keyboard Actions
type(text)
Type text at the current cursor position.text(string) - The text to type
- Filling form fields
- Entering search queries
- Text input
hotkey(*keys)
Send keyboard shortcut combinations.*keys(strings) - Key names to press simultaneously
ctrl/cmd + c- Copyctrl/cmd + v- Pastectrl/cmd + s- Savectrl/cmd + f- Findalt + tab- Switch windowsenter- Submit/Enterescape- Cancel/Escape
- Keyboard shortcuts
- System commands
- Application controls
key_down(key) / keyDown(key)
Press and hold a keyboard key. Use withkey_up for complex interactions like shift-click selection.
key(string) - Key name to press (e.g., “Shift”, “Control”, “Alt”)
- Shift-click selection
- Ctrl-click multi-select
- Holding modifier keys during mouse operations
key_up(key) / keyUp(key)
Release a keyboard key that was previously pressed withkey_down.
key(string) - Key name to release
Scrolling
scroll(dx, dy)
Scroll the viewport by delta x and delta y.dx(float) - Horizontal scroll delta (positive = right, negative = left)dy(float) - Vertical scroll delta (positive = down, negative = up)
- Scrolling web pages
- Loading lazy-loaded content
- Viewing off-screen content
- Horizontal scrolling in wide layouts
Capture
screenshot()
Capture a screenshot of the current screen.ActionResultwithscreenshot_urlin result dict
- Capturing evidence
- Visual verification
- Monitoring
- Documentation
html()
Get the HTML content of the current page.auto_detect_encoding(bool, optional) - Automatically detect character encoding
ActionResultwithhtml_contentin result dict
- Web scraping
- Content extraction
- Page analysis
- Data mining
debug(command)
Execute a shell command inside the session.command(string) - Shell command to executetimeout_seconds(int, optional) - Command timeout (default: 120)max_output_length(int, optional) - Max output bytes (default: 65536)
ActionResultwithdebug_responsecontaining command output
- Running scripts inside browser environment
- Debugging
- File operations
- System commands
set_viewport(width, height)
Change the browser viewport dimensions.width(int) - Viewport width in pixelsheight(int) - Viewport height in pixelsscale_factor(float, optional) - Device pixel ratio / zoom level (default: 1.0)
- Testing responsive designs
- Capturing full-width screenshots
- Simulating different devices
- Mobile viewport testing
change_proxy(proxy_url)
Change the proxy settings for the browser session. Browser instances only.proxy_url(string) - Proxy URL in formathttp://user:pass@host:portorsocks5://user:pass@host:port
- Geo-targeted testing
- Avoiding rate limits
- Web scraping with rotating proxies
- Ad verification across regions
Timing
wait(seconds)
Wait for a specified duration.seconds(float) - Duration to wait
- Waiting for page loads
- Allowing animations to complete
- Rate limiting
- Synchronization delays
The
wait() action does not count toward your action quota.Batch Actions
batch(actions)
Execute multiple actions in sequence with a single API call. The batch stops on the first error.actions(array) - Array of action objects to execute sequentially
- Actions execute in order
- Stops immediately on first error
- Returns results for all executed actions
- Reducing latency for multi-step workflows
- Atomic operation sequences
- Efficient automation scripts
Complete Example
Here’s a workflow using multiple actions:Action Reference
Core Actions
| Action | Browser | Desktop | Parameters |
|---|---|---|---|
navigate(url) | ✅ | ❌ | url: string |
click(x, y) | ✅ | ✅ | x: float, y: float |
double_click(x, y) | ✅ | ✅ | x: float, y: float |
right_click(x, y) | ✅ | ✅ | x: float, y: float |
drag(x1, y1, x2, y2) | ✅ | ✅ | x1, y1, x2, y2: float |
type(text) | ✅ | ✅ | text: string |
hotkey(*keys) | ✅ | ✅ | *keys: string |
scroll(dx, dy) | ✅ | ✅ | dx: float, dy: float |
screenshot() | ✅ | ✅ | base64?: bool |
html() | ✅ | ❌ | auto_detect_encoding?: bool |
debug(cmd) | ✅ | ✅ | command: string, timeout?, max_output? |
set_viewport(w, h) | ✅ | ❌ | width: int, height: int, scale_factor?: float |
change_proxy(url) | ✅ | ❌ | proxy_url: string |
wait(seconds) | ✅ | ✅ | seconds: float |
Low-Level Input Actions
| Action | Browser | Desktop | Parameters |
|---|---|---|---|
mouse_down(x, y) | ✅ | ✅ | x: float, y: float |
mouse_up(x, y) | ✅ | ✅ | x: float, y: float |
key_down(key) | ✅ | ✅ | key: string |
key_up(key) | ✅ | ✅ | key: string |
Batch Operations
| Action | Browser | Desktop | Parameters |
|---|---|---|---|
batch(actions) | ✅ | ✅ | actions: array of action objects |
Best Practices
Use wait() generously
Use wait() generously
Always wait after navigation, form submissions, or dynamic content loading
Verify coordinates
Verify coordinates
Test coordinate values with screenshots before automation
Handle errors
Handle errors
Check action results, especially for navigation and screenshots
Screenshot for verification
Screenshot for verification
Capture screenshots to verify successful completion