Skip to main content

The Observe-Decide-Act Loop

AI agents interact with browsers through a repeating cycle:
1

Observe

Call agent/observe to get a structured snapshot of the page: interactive elements with stable refs, accessibility tree, optional annotated screenshot.
2

Decide

The LLM processes the observation and decides which action to take next. Element refs and action hints from observe make this decision straightforward.
3

Act

Call agent/act with a sequence of steps built from the LLM’s decision. Each step uses refs or semantic selectors from the observation.
4

Repeat

After acting, observe again to see the result. The incremental observation mode shows only what changed.
// Basic agent loop
while (!taskComplete) {
  const obs = await client.observe({
    includeInteractiveElements: true,
    responseTier: "interactive",
    incremental: loopCount > 0,
  });

  const decision = await llm.decide(obs); // Your LLM decides

  const result = await client.act({
    steps: decision.steps,
    postObserve: { includeInteractiveElements: true },
  });

  taskComplete = decision.done || !result.success;
  loopCount++;
}

Fusion for Fewer Roundtrips

Fusion operations combine multiple BAP calls into single requests. For agents, this means fewer tool calls, less token overhead, and faster execution. Instead of two calls (navigate then observe), fuse them:
// Without fusion: 2 calls
await client.navigate("https://example.com");
const obs = await client.observe();

// With fusion: 1 call
const navResult = await client.navigate("https://example.com", {
  observe: { includeInteractiveElements: true },
});

Act + Post-Observe

Get the page state after an action sequence without a separate observe call:
const result = await client.act({
  steps: [
    BAPClient.step("action/fill", { selector: label("Email"), value: "user@example.com" }),
    BAPClient.step("action/click", { selector: role("button", "Submit") }),
  ],
  postObserve: {
    includeInteractiveElements: true,
    responseTier: "interactive",
  },
});

// result.postObservation has the page state after form submission
const nextPageElements = result.postObservation?.interactiveElements;

Pre + Act + Post (Full Kernel)

Capture state before and after in a single call:
const result = await client.act({
  preObserve: { includeInteractiveElements: true },
  steps: [
    /* action steps */
  ],
  postObserve: { includeInteractiveElements: true },
});

// Compare pre and post state
const before = result.preObservation?.interactiveElements?.length;
const after = result.postObservation?.interactiveElements?.length;

Response Tiers

Control how much data comes back in observations to minimize token usage:
TierIncludesToken Cost
fullElements + accessibility tree + screenshot + metadataHigh
interactiveElements + metadata (skip tree and screenshot)Medium
minimalElement refs + names only (no bounds, stripped hints)Low
// For LLM decision-making, interactive tier is usually sufficient
const obs = await client.observe({
  includeInteractiveElements: true,
  responseTier: "interactive",
});

// For detailed analysis, use full tier
const detailed = await client.observe({
  includeInteractiveElements: true,
  includeAccessibility: true,
  includeScreenshot: true,
  responseTier: "full",
});

Incremental Observation

After the first observation, use incremental mode to get only what changed:
// First observation: full snapshot
const initial = await client.observe({
  includeInteractiveElements: true,
});

// Subsequent observations: changes only
const update = await client.observe({
  includeInteractiveElements: true,
  incremental: true,
});

if (update.changes) {
  console.log("Added elements:", update.changes.added);
  console.log("Updated elements:", update.changes.updated);
  console.log("Removed refs:", update.changes.removed);
}

Session Persistence

Agent sessions survive disconnections. Include a sessionId when connecting:
const client = new BAPClient("ws://localhost:9222", {
  sessionId: "my-agent-session",
});
await client.connect();

// ... agent does work, then disconnects
await client.close();

// Later: reconnect and resume
const client2 = new BAPClient("ws://localhost:9222", {
  sessionId: "my-agent-session",
});
await client2.connect();
// Browser state (pages, cookies, element refs) is restored
The CLI auto-generates session IDs as cli-<port> (e.g., cli-9222). Override with the -s flag:
bap goto https://example.com -s=research-session
# ... disconnect and reconnect later
bap observe -s=research-session  # State is preserved
Dormant sessions expire after 300 seconds by default (configurable via dormantSessionTtl). After expiry, the browser state is destroyed.

Annotated Screenshots (Set-of-Marks)

For vision-capable LLMs, request annotated screenshots with element badges:
const obs = await client.observe({
  includeInteractiveElements: true,
  includeScreenshot: true,
  annotateScreenshot: {
    enabled: true,
    labelFormat: "ref", // "@e1", "@submit"
    style: {
      badge: { color: "#ff0000", textColor: "#ffffff", size: 20 },
      showBoundingBox: true,
    },
  },
});

// obs.screenshot.data contains annotated screenshot (base64)
// obs.annotationMap maps labels to element refs and positions

Platform Installation

BAP CLI supports 13 AI agent platforms via bap install-skill:
bap install-skill
# Auto-detects: Claude Code, Codex, Gemini CLI, Cursor, Copilot,
# Windsurf, Roo Code, and 6 more
This copies the appropriate SKILL.md file to the detected platform’s skill directory, giving the agent structured documentation about BAP’s capabilities.