Skip to main content

Overview

Every computer instance provides core actions for browser and desktop automation. These actions work identically across both browser and desktop instances (except navigate, which is browser-only). Navigate to a URL. Browser instances only.
computer.navigate("https://example.com")
Parameters:
  • url (string) - The URL to navigate to
Use Cases:
  • Opening web pages
  • Navigating between pages
  • Loading web applications

Mouse Actions

Coordinates are viewport-dependent. The same coordinates will produce different results on different screen sizes. Always:
  1. Take a screenshot first to see the page
  2. Measure the pixel position of elements
  3. Use those specific coordinates for your viewport

click(x, y)

Click at specific screen coordinates.
computer.click(100, 200)
Parameters:
  • x (int) - X coordinate in pixels
  • y (int) - Y coordinate in pixels
Use Cases:
  • Clicking buttons
  • Selecting menu items
  • Activating UI elements

double_click(x, y) / doubleClick(x, y)

Double-click at specific coordinates.
computer.double_click(150, 250)
Parameters:
  • x (int) - X coordinate
  • y (int) - Y coordinate
Use Cases:
  • Opening files
  • Selecting text
  • Desktop application interactions

right_click(x, y) / rightClick(x, y)

Right-click at specific coordinates to open context menus.
computer.right_click(200, 300)
Parameters:
  • x (int) - X coordinate
  • y (int) - Y coordinate
Use Cases:
  • Opening context menus
  • Right-click actions
  • Desktop interactions

drag(from_x, from_y, to_x, to_y)

Drag from one position to another.
computer.drag(100, 100, 300, 300)
Parameters:
  • from_x (int) - Starting X coordinate
  • from_y (int) - Starting Y coordinate
  • to_x (int) - Ending X coordinate
  • to_y (int) - Ending Y coordinate
Use Cases:
  • Drag and drop
  • Moving windows
  • Selecting regions

mouse_down(x, y) / mouseDown(x, y)

Press and hold the left mouse button at specific coordinates. Use with mouse_up for fine-grained drag control.
# Using the low-level API for mouse control
client.computers.execute_action(computer.id, action={
    "type": "mouse_down",
    "x": 100,
    "y": 200
})
Parameters:
  • x (float) - X coordinate in pixels
  • y (float) - Y coordinate in pixels
Use Cases:
  • Fine-grained drag control
  • Custom drag interactions
  • Drawing applications

mouse_up(x, y) / mouseUp(x, y)

Release the left mouse button at specific coordinates.
client.computers.execute_action(computer.id, action={
    "type": "mouse_up",
    "x": 300,
    "y": 400
})
Parameters:
  • x (float) - X coordinate in pixels
  • y (float) - Y coordinate in pixels
Use Cases:
  • Complete a fine-grained drag operation
  • Release after mouse_down
For simple drag operations, use drag() instead. Use mouse_down/mouse_up when you need control between the press and release (e.g., moving to intermediate positions).

Keyboard Actions

type(text)

Type text at the current cursor position.
computer.type("Hello World")
Parameters:
  • text (string) - The text to type
Use Cases:
  • Filling form fields
  • Entering search queries
  • Text input

hotkey(*keys)

Send keyboard shortcut combinations.
computer.hotkey("Control", "c")    # Copy
computer.hotkey("cmd", "v")        # Paste on Mac
computer.hotkey("alt", "F4")       # Close window
Parameters:
  • *keys (strings) - Key names to press simultaneously
Common Shortcuts:
  • ctrl/cmd + c - Copy
  • ctrl/cmd + v - Paste
  • ctrl/cmd + s - Save
  • ctrl/cmd + f - Find
  • alt + tab - Switch windows
  • enter - Submit/Enter
  • escape - Cancel/Escape
Use Cases:
  • Keyboard shortcuts
  • System commands
  • Application controls

key_down(key) / keyDown(key)

Press and hold a keyboard key. Use with key_up for complex interactions like shift-click selection.
# Shift-click selection example
client.computers.execute_action(computer.id, action={"type": "key_down", "key": "Shift"})
computer.click(100, 200)  # First item
computer.click(100, 400)  # Last item (selects range)
client.computers.execute_action(computer.id, action={"type": "key_up", "key": "Shift"})
Parameters:
  • key (string) - Key name to press (e.g., “Shift”, “Control”, “Alt”)
Use Cases:
  • Shift-click selection
  • Ctrl-click multi-select
  • Holding modifier keys during mouse operations

key_up(key) / keyUp(key)

Release a keyboard key that was previously pressed with key_down.
client.computers.execute_action(computer.id, action={"type": "key_up", "key": "Shift"})
Parameters:
  • key (string) - Key name to release
Always pair key_down with key_up to avoid leaving keys stuck in a pressed state.

Scrolling

scroll(dx, dy, x?, y?)

Scroll the viewport by delta x and delta y, optionally at a specific position.
computer.scroll(dx=0, dy=500)            # Scroll down 500px at current position
computer.scroll(dx=0, dy=-300)           # Scroll up 300px
computer.scroll(dx=100, dy=0)            # Scroll right 100px
computer.scroll(dx=0, dy=500, x=400, y=300)  # Scroll at specific coordinates
Parameters:
  • dx (float) - Horizontal scroll delta (positive = right, negative = left)
  • dy (float) - Vertical scroll delta (positive = down, negative = up)
  • x (float, optional) - X coordinate where the scroll action originates
  • y (float, optional) - Y coordinate where the scroll action originates
Use x and y when you need to scroll within a specific scrollable element (like a sidebar or modal) rather than the main page.
Use Cases:
  • Scrolling web pages
  • Loading lazy-loaded content
  • Viewing off-screen content
  • Horizontal scrolling in wide layouts
  • Scrolling within specific elements (use x, y to target)

Capture

screenshot(base64?)

Capture a screenshot of the current screen.
# Get screenshot as URL (default)
result = computer.screenshot()
url = result.result['screenshot_url']
print(f"Screenshot: {url}")

# Get screenshot as base64-encoded JPEG
result = computer.screenshot(base64=True)
base64_data = result.result['screenshot_url']  # Contains base64 data, not URL
Parameters:
  • base64 (bool, optional) - If true, returns base64-encoded JPEG data instead of a URL. Default: false
Returns:
  • ActionResult with screenshot_url in result dict
    • When base64=false (default): Contains an HTTPS URL to the screenshot image
    • When base64=true: Contains raw base64-encoded JPEG data (not a URL, despite the field name)
Use Cases:
  • Capturing evidence
  • Visual verification
  • Monitoring
  • Documentation
  • Embedding images directly (use base64=true)

html()

Get the HTML content of the current page.
result = computer.html()
html_content = computer.get_html_content(result)
print(html_content)

# With encoding detection
result = computer.html(auto_detect_encoding=True)
Parameters:
  • auto_detect_encoding (bool, optional) - Automatically detect character encoding
Returns:
  • ActionResult with html_content in result dict
Use Cases:
  • Web scraping
  • Content extraction
  • Page analysis
  • Data mining

debug(command)

Execute a shell command inside the session.
result = computer.debug("ls -la")
output = computer.get_debug_response(result)
print(output)

# With timeout
result = computer.debug(
    "sleep 10",
    timeout_seconds=15,
    max_output_length=10000
)
Parameters:
  • command (string) - Shell command to execute
  • timeout_seconds (int, optional) - Command timeout (default: 120)
  • max_output_length (int, optional) - Max output bytes (default: 65536)
Returns:
  • ActionResult with debug_response containing command output
Use Cases:
  • Running scripts inside browser environment
  • Debugging
  • File operations
  • System commands
The /computers/{id}/debug endpoint is deprecated for shell commands. debug() is buffered and does not stream output. Use /computers/{id}/exec or /computers/{id}/exec/sync for command execution.

Shell Execution

The /exec and /exec/sync endpoints are available for desktop sessions (kind: "desktop"). These endpoints run shell commands in a real Linux environment.

exec (streaming NDJSON)

Stream stdout/stderr in real time using the HTTP endpoint. This is ideal for long-running commands or live progress updates. Desktop sessions only. Request body:
  • command (string, required)
  • cwd (string, optional, default: /workspace)
  • env (object, optional)
  • timeout_seconds (int, optional, default: 120)
Response:
  • application/x-ndjson
  • Each line is a JSON object:
    • {"type":"stdout","data":"..."}
    • {"type":"stderr","data":"..."}
    • {"type":"exit","code":0}
    • {"type":"error","code":"...","message":"..."}
import json
import requests

url = f"https://api.tzafon.ai/computers/{computer_id}/exec"
headers = {
    "Authorization": f"Bearer {api_key}",
    "Accept": "application/x-ndjson",
}
payload = {"command": "bash -lc 'for i in 1 2 3; do echo $i; sleep 1; done'"}

with requests.post(url, headers=headers, json=payload, stream=True) as response:
    for line in response.iter_lines(decode_unicode=True):
        if not line:
            continue
        msg = json.loads(line)
        if msg["type"] == "stdout":
            print(msg["data"], end="")
        elif msg["type"] == "stderr":
            print(msg["data"], end="")
        elif msg["type"] == "exit":
            print(f"Exit code: {msg['code']}")
            break
        elif msg["type"] == "error":
            raise RuntimeError(msg.get("message", "exec error"))
The streaming endpoint returns HTTP 200 even on malformed input. Always parse the stream and handle type: "error" lines.

exec/sync (buffered)

Execute a command and return buffered stdout/stderr in a single response. Desktop sessions only. Response fields:
  • stdout (string)
  • stderr (string)
  • exit_code (int)
import requests

url = f"https://api.tzafon.ai/computers/{computer_id}/exec/sync"
headers = {"Authorization": f"Bearer {api_key}"}
payload = {"command": "ls -la"}

response = requests.post(url, headers=headers, json=payload)
response.raise_for_status()
print(response.json())

set_viewport(width, height)

Change the browser viewport dimensions.
computer.set_viewport(1920, 1080)
computer.set_viewport(1280, 720, scale_factor=2.0)
Parameters:
  • width (int) - Viewport width in pixels
  • height (int) - Viewport height in pixels
  • scale_factor (float, optional) - Device pixel ratio / zoom level (default: 1.0)
Use Cases:
  • Testing responsive designs
  • Capturing full-width screenshots
  • Simulating different devices
  • Mobile viewport testing

change_proxy(proxy_url)

Change the proxy settings for the browser session. Browser instances only.
# Using the low-level API to change proxy
client.computers.execute_action(computer.id, action={
    "type": "change_proxy",
    "proxy_url": "http://user:pass@proxy-server:port"
})
Parameters:
  • proxy_url (string) - Proxy URL in format http://user:pass@host:port or socks5://user:pass@host:port
Use Cases:
  • Geo-targeted testing
  • Avoiding rate limits
  • Web scraping with rotating proxies
  • Ad verification across regions
Set the proxy immediately after creating the session, before any page navigation. See the Playwright with Proxy example for a complete workflow.

Timing

wait(seconds)

Wait for a specified duration.
computer.wait(2.0)   # Wait 2 seconds
computer.wait(0.5)   # Wait 500ms
Parameters:
  • seconds (float) - Duration to wait
Use Cases:
  • Waiting for page loads
  • Allowing animations to complete
  • Rate limiting
  • Synchronization delays
Implementation differs between SDKs:
  • Python: Executes server-side via execute_action({"type": "wait", "ms": ...}). This makes an API call.
  • TypeScript: Executes client-side via sleep(). This does NOT make an API call.
Both achieve the same end result (pausing execution), but Python’s wait happens on the remote session while TypeScript’s wait happens locally.

Batch Actions

batch(actions)

Execute multiple actions in sequence with a single API call. The batch stops on the first error.
# Execute multiple actions in one request
result = client.computers.execute_batch(computer.id, actions=[
    {"type": "go_to_url", "url": "https://example.com"},
    {"type": "wait", "ms": 2000},
    {"type": "click", "x": 100, "y": 200},
    {"type": "type", "text": "search query"},
    {"type": "keypress", "keys": ["enter"]},
    {"type": "screenshot"}
])
Parameters:
  • actions (array) - Array of action objects to execute sequentially
Behavior:
  • Actions execute in order
  • Stops immediately on first error
  • Returns results for all executed actions
Use Cases:
  • Reducing latency for multi-step workflows
  • Atomic operation sequences
  • Efficient automation scripts
Use batch actions when you have a sequence of actions that don’t require intermediate inspection. This reduces network round-trips and improves performance.

Complete Example

Here’s a workflow using multiple actions:
from tzafon import Computer

client = Computer()
with client.create(kind="browser") as computer:
    # Navigate
    computer.navigate("https://example.com")
    computer.wait(1)

    # Fill form
    computer.click(100, 200)  # Click input
    computer.type("search query")
    computer.hotkey("enter")
    computer.wait(2)

    # Scroll and capture
    computer.scroll(dx=0, dy=500)
    computer.wait(1)

    result = computer.screenshot()
    url = computer.get_screenshot_url(result)
    print(f"Screenshot: {url}")

Action Reference

Core Actions

ActionBrowserDesktopParameters
navigate(url)url: string
click(x, y)x: float, y: float
double_click(x, y)x: float, y: float
right_click(x, y)x: float, y: float
drag(x1, y1, x2, y2)x1, y1, x2, y2: float
type(text)text: string
hotkey(*keys)*keys: string
scroll(dx, dy)dx: float, dy: float
screenshot()base64?: bool
html()auto_detect_encoding?: bool
debug(cmd)command: string, timeout?, max_output?
set_viewport(w, h)width: int, height: int, scale_factor?: float
change_proxy(url)proxy_url: string
wait(seconds)seconds: float

Batch Operations

ActionBrowserDesktopParameters
batch(actions)actions: array of action objects

Best Practices

Always wait after navigation, form submissions, or dynamic content loading
Test coordinate values with screenshots before automation
Check action results, especially for navigation and screenshots
Capture screenshots to verify successful completion

Next Steps