> ## Documentation Index
> Fetch the complete documentation index at: https://piyushvyas.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Agent Integration

> Build AI agent loops with observe-decide-act, fusion operations, session persistence, and token efficiency.

## The Observe-Decide-Act Loop

AI agents interact with browsers through a repeating cycle:

<Steps>
  <Step title="Observe">
    Call `agent/observe` to get a structured snapshot of the page: interactive elements with stable
    refs, accessibility tree, optional annotated screenshot.
  </Step>

  <Step title="Decide">
    The LLM processes the observation and decides which action to take next. Element refs and action
    hints from observe make this decision straightforward.
  </Step>

  <Step title="Act">
    Call `agent/act` with a sequence of steps built from the LLM's decision. Each step uses refs or
    semantic selectors from the observation.
  </Step>

  <Step title="Repeat">
    After acting, observe again to see the result. The incremental observation mode shows only what
    changed.
  </Step>
</Steps>

```typescript theme={null}
// Basic agent loop
while (!taskComplete) {
  const obs = await client.observe({
    includeInteractiveElements: true,
    responseTier: "interactive",
    incremental: loopCount > 0,
  });

  const decision = await llm.decide(obs); // Your LLM decides

  const result = await client.act({
    steps: decision.steps,
    postObserve: { includeInteractiveElements: true },
  });

  taskComplete = decision.done || !result.success;
  loopCount++;
}
```

## Fusion for Fewer Roundtrips

Fusion operations combine multiple BAP calls into single requests. For agents, this means fewer tool calls, less token overhead, and faster execution.

### Navigate + Observe

Instead of two calls (`navigate` then `observe`), fuse them:

```typescript theme={null}
// Without fusion: 2 calls
await client.navigate("https://example.com");
const obs = await client.observe();

// With fusion: 1 call
const navResult = await client.navigate("https://example.com", {
  observe: { includeInteractiveElements: true },
});
```

### Act + Post-Observe

Get the page state after an action sequence without a separate observe call:

```typescript theme={null}
const result = await client.act({
  steps: [
    BAPClient.step("action/fill", { selector: label("Email"), value: "user@example.com" }),
    BAPClient.step("action/click", { selector: role("button", "Submit") }),
  ],
  postObserve: {
    includeInteractiveElements: true,
    responseTier: "interactive",
  },
});

// result.postObservation has the page state after form submission
const nextPageElements = result.postObservation?.interactiveElements;
```

### Pre + Act + Post (Full Kernel)

Capture state before and after in a single call:

```typescript theme={null}
const result = await client.act({
  preObserve: { includeInteractiveElements: true },
  steps: [
    /* action steps */
  ],
  postObserve: { includeInteractiveElements: true },
});

// Compare pre and post state
const before = result.preObservation?.interactiveElements?.length;
const after = result.postObservation?.interactiveElements?.length;
```

## Response Tiers

Control how much data comes back in observations to minimize token usage:

| Tier            | Includes                                              | Token Cost |
| --------------- | ----------------------------------------------------- | :--------: |
| **full**        | Elements + accessibility tree + screenshot + metadata |    High    |
| **interactive** | Elements + metadata (skip tree and screenshot)        |   Medium   |
| **minimal**     | Element refs + names only (no bounds, stripped hints) |     Low    |

```typescript theme={null}
// For LLM decision-making, interactive tier is usually sufficient
const obs = await client.observe({
  includeInteractiveElements: true,
  responseTier: "interactive",
});

// For detailed analysis, use full tier
const detailed = await client.observe({
  includeInteractiveElements: true,
  includeAccessibility: true,
  includeScreenshot: true,
  responseTier: "full",
});
```

## Incremental Observation

After the first observation, use incremental mode to get only what changed:

```typescript theme={null}
// First observation: full snapshot
const initial = await client.observe({
  includeInteractiveElements: true,
});

// Subsequent observations: changes only
const update = await client.observe({
  includeInteractiveElements: true,
  incremental: true,
});

if (update.changes) {
  console.log("Added elements:", update.changes.added);
  console.log("Updated elements:", update.changes.updated);
  console.log("Removed refs:", update.changes.removed);
}
```

## Session Persistence

Agent sessions survive disconnections. Include a `sessionId` when connecting:

```typescript theme={null}
const client = new BAPClient("ws://localhost:9222", {
  sessionId: "my-agent-session",
});
await client.connect();

// ... agent does work, then disconnects
await client.close();

// Later: reconnect and resume
const client2 = new BAPClient("ws://localhost:9222", {
  sessionId: "my-agent-session",
});
await client2.connect();
// Browser state (pages, cookies, element refs) is restored
```

The CLI auto-generates session IDs as `cli-<port>` (e.g., `cli-9222`). Override with the `-s` flag:

```bash theme={null}
bap goto https://example.com -s=research-session
# ... disconnect and reconnect later
bap observe -s=research-session  # State is preserved
```

<Note>
  Dormant sessions expire after 300 seconds by default (configurable via `dormantSessionTtl`). After
  expiry, the browser state is destroyed.
</Note>

## Annotated Screenshots (Set-of-Marks)

For vision-capable LLMs, request annotated screenshots with element badges:

```typescript theme={null}
const obs = await client.observe({
  includeInteractiveElements: true,
  includeScreenshot: true,
  annotateScreenshot: {
    enabled: true,
    labelFormat: "ref", // "@e1", "@submit"
    style: {
      badge: { color: "#ff0000", textColor: "#ffffff", size: 20 },
      showBoundingBox: true,
    },
  },
});

// obs.screenshot.data contains annotated screenshot (base64)
// obs.annotationMap maps labels to element refs and positions
```

## Platform Installation

BAP CLI supports 13 AI agent platforms via `bap install-skill`:

```bash theme={null}
bap install-skill
# Auto-detects: Claude Code, Codex, Gemini CLI, Cursor, Copilot,
# Windsurf, Roo Code, and 6 more
```

This copies the appropriate SKILL.md file to the detected platform's skill directory, giving the agent structured documentation about BAP's capabilities.
