@browseragentprotocol/protocol and as Pydantic models in the Python SDK.
BAPSelector
A union type representing one of the 10 selector types. Used in all action and observation methods.- TypeScript
- Python
InteractiveElement
Returned byagent/observe. Represents a single interactive element on the page with a stable reference, selector, and metadata.
Stable element reference ID (e.g.,
"@e1" or "@submitBtn"). Persists across observations for
the same element.Pre-computed selector that uniquely targets this element. Use this directly in action methods.
ARIA role (e.g.,
"button", "textbox", "link", "checkbox").Accessible name of the element.
Current value (for input elements).
What actions can be performed:
"clickable", "editable", "selectable", "checkable",
"expandable", "draggable", "scrollable", "submittable".HTML tag name (e.g.,
"button", "input", "a").Bounding box
{ x, y, width, height } if includeBounds was requested.Whether the element currently has focus.
Whether the element is disabled.
Ref stability indicator:
"stable" (same element, same ref), "new" (newly discovered),
"moved" (element found but ref changed).Alternative selectors ordered by reliability: testId, role+name, css #id, text, css path.
ExecutionStep
Defines a single step in anagent/act sequence.
The BAP method to execute (e.g.,
"action/click", "action/fill", "page/navigate").Parameters for the action.
Human-readable label for logging and debugging.
Pre-condition that must be met before execution. Object with
selector, state ("visible" |
"enabled" | "exists" | "hidden" | "disabled"), and optional timeout.Error handling strategy:
"stop" (default), "skip", or "retry".Max retries if
onError is "retry" (1-5).Building Steps
- TypeScript
- Python
AgentActResult
Returned byagent/act. Contains results for each step and optional fused observations.
Number of steps completed successfully.
Total number of steps in the sequence.
Whether all steps succeeded.
Results for each step in order. Each contains
step (index), success, duration, optional
result data, and optional error.Total execution time in milliseconds.
Index of the first failed step (if any).
Pre-execution observation result (if
preObserve was set).Post-execution observation result (if
postObserve was set).AgentObserveResult
Returned byagent/observe. Contains the page snapshot.
Page metadata:
url, title, viewport { width, height }.List of interactive elements with stable refs and selectors.
Total count on the page (may exceed the returned list if
maxElements was set).Full accessibility tree (if
includeAccessibility was set).Screenshot data with
data (base64), format, width, height, annotated.Mapping from annotation labels to element refs and positions (if screenshot annotation was used).
Incremental changes:
added, updated, removed (if incremental: true).WebMCP tools discovered on the page (if
includeWebMCPTools: true).AgentExtractResult
Returned byagent/extract.
Whether extraction succeeded.
Extracted data matching the provided schema.
Source element references for extracted data (if
includeSourceRefs was set).Confidence score (0-1) for the extraction.
Error message if extraction failed.
ExtractionSchema
Simplified JSON Schema subset (max 2 levels deep) used byagent/extract:
Allowed Act Actions
These method names are permitted inExecutionStep.action:
| Action | Description |
|---|---|
action/click | Click element |
action/dblclick | Double-click |
action/fill | Fill input |
action/type | Type character by character |
action/press | Press key |
action/hover | Hover |
action/scroll | Scroll |
action/select | Select option |
action/check | Check checkbox |
action/uncheck | Uncheck checkbox |
action/clear | Clear input |
action/upload | Upload file |
action/drag | Drag element |
page/navigate | Navigate to URL |
page/reload | Reload page |
page/goBack | Go back |
page/goForward | Go forward |