CLI
# Extract specific fields
bap extract --fields="title,price,rating"
# Extract a list of items
bap extract --list="product" --fields="name,price,url"
# Extract with a JSON schema
bap extract --schema='{"type":"array","items":{"type":"object","properties":{"title":{"type":"string"},"price":{"type":"number"}}}}'
TypeScript SDK
const data = await client.extract({
instruction: "Extract all product names and prices",
schema: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string", description: "Product name" },
price: { type: "number", description: "Price in dollars" },
},
},
},
mode: "list",
});
if (data.success) {
for (const product of data.data) {
console.log(`${product.name}: $${product.price}`);
}
}
Python SDK
data = await client.extract(
instruction="Extract all product names and prices",
schema={
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
},
},
},
)
| Mode | Description | Use Case |
|---|
single | Extract one item matching the schema | Product detail page, user profile |
list | Extract all matching items | Search results, product listings |
table | Extract tabular data | Pricing tables, comparison charts |
// Single item extraction
const product = await client.extract({
instruction: "Extract the main product details",
schema: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
description: { type: "string" },
inStock: { type: "boolean" },
},
},
mode: "single",
});
// Table extraction
const pricing = await client.extract({
instruction: "Extract the pricing comparison table",
schema: {
type: "array",
items: {
type: "object",
properties: {
plan: { type: "string" },
price: { type: "string" },
features: { type: "string" },
},
},
},
mode: "table",
});
Limit extraction to a specific container to avoid pulling data from sidebars or navigation:
const data = await client.extract({
instruction: "Extract articles",
schema: {
/* ... */
},
selector: { type: "css", value: "main.content" },
});
Scoped extraction is important on pages with complex layouts. Without a scope selector, the
extractor may pick up sidebar items, footer links, or navigation elements that match your schema.
Source References
Track which DOM elements contributed to each extracted value:
const data = await client.extract({
instruction: "Extract product listings",
schema: {
/* ... */
},
includeSourceRefs: true,
});
if (data.sources) {
for (const source of data.sources) {
console.log(`Ref: ${source.ref}, Text: ${source.text}`);
}
}
For multi-page scraping, combine extraction with navigation:
const allProducts = [];
while (true) {
const page = await client.extract({
instruction: "Extract all products on this page",
schema: {
type: "array",
items: {
type: "object",
properties: {
name: { type: "string" },
price: { type: "number" },
},
},
},
mode: "list",
});
allProducts.push(...page.data);
// Check for "Next" button
const obs = await client.observe({ filterRoles: ["link"] });
const nextLink = obs.interactiveElements?.find((el) => el.name === "Next");
if (!nextLink) break; // No more pages
await client.click(nextLink.selector);
}
# Extract from current page
bap extract --list="book" --fields="title,price"
# Navigate to next page and extract again
bap click text:"Next"
bap extract --list="book" --fields="title,price"
Confidence Scores
Extraction results include an optional confidence score (0-1):
const data = await client.extract({
/* ... */
});
console.log(`Confidence: ${data.confidence}`);
// 0.95 = high confidence
// 0.60 = some fields may be inaccurate
The extract method uses heuristic-based DOM analysis, not LLM reasoning. For complex pages where
the schema fields do not map cleanly to visible text, extraction accuracy may be lower. Consider
using observe/content with your own parsing logic for edge cases.