Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
99 changes: 99 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,105 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

### 2026-02-15

#### PredicateBrowserAgent (snapshot-first, verification-first)

`PredicateBrowserAgent` is a new high-level agent wrapper that gives you a **browser-use-like** `step()` / `run()` surface, but keeps Predicate’s core philosophy:

- **Snapshot-first perception** (structured DOM snapshot is the default)
- **Verification-first control plane** (you can gate progress with deterministic checks)
- Optional **vision fallback** (bounded) when snapshots aren’t sufficient

It’s built on top of `AgentRuntime` + `RuntimeAgent`.

##### Quickstart (single step)

```python
from predicate import AgentRuntime, PredicateBrowserAgent, PredicateBrowserAgentConfig, RuntimeStep
from predicate.llm_provider import OpenAIProvider # or AnthropicProvider / DeepInfraProvider / LocalLLMProvider

runtime = AgentRuntime(backend=...) # PlaywrightBackend, CDPBackendV0, etc.
llm = OpenAIProvider(model="gpt-4o-mini")

agent = PredicateBrowserAgent(
runtime=runtime,
executor=llm,
config=PredicateBrowserAgentConfig(
# Token control: include last N step summaries in the prompt (0 disables history).
history_last_n=2,
),
)

ok = await agent.step(
task_goal="Find pricing and verify checkout button exists",
step=RuntimeStep(goal="Open pricing page"),
)
```

##### Customize the compact prompt (advanced)

If you want to change the “compact prompt” the executor sees (e.g. fewer fields / different layout), you can override it:

```python
from predicate import PredicateBrowserAgentConfig

def compact_prompt_builder(task_goal, step_goal, dom_context, snapshot, history_summary):
system = "You are a web automation agent. Return ONLY one action: CLICK(id) | TYPE(id, \"text\") | PRESS(\"key\") | FINISH()"
user = f"TASK: {task_goal}\nSTEP: {step_goal}\n\nRECENT:\n{history_summary}\n\nELEMENTS:\n{dom_context}\n\nReturn the single best action:"
return system, user

config = PredicateBrowserAgentConfig(compact_prompt_builder=compact_prompt_builder)
```

##### CAPTCHA handling (interface-only; no solver shipped)

If you set `captcha.policy="callback"`, you must provide a handler. The SDK does **not** include a public CAPTCHA solver.

```python
from predicate import CaptchaConfig, HumanHandoffSolver, PredicateBrowserAgentConfig

config = PredicateBrowserAgentConfig(
captcha=CaptchaConfig(
policy="callback",
# Manual solve in the live session; SDK waits until it clears:
handler=HumanHandoffSolver(timeout_ms=10 * 60_000, poll_ms=1_000),
)
)
```

##### LLM providers (cloud or local)

`PredicateBrowserAgent` works with any `LLMProvider` implementation. For a local HF Transformers model:

```python
from predicate.llm_provider import LocalLLMProvider

llm = LocalLLMProvider(model_name="Qwen/Qwen2.5-3B-Instruct", device="auto", load_in_4bit=True)
```

##### Opt-in token usage accounting (best-effort)

If you want to measure token spend, you can enable best-effort accounting (depends on provider reporting `prompt_tokens` / `completion_tokens` / `total_tokens` in `LLMResponse`):

```python
from predicate import PredicateBrowserAgentConfig

config = PredicateBrowserAgentConfig(token_usage_enabled=True)

# Later:
usage = agent.get_token_usage()
agent.reset_token_usage()
```

##### RuntimeAgent: act once without step lifecycle (orchestrators)

`RuntimeAgent` now exposes `act_once(...)` helpers that execute exactly one action **without** calling `runtime.begin_step()` / `runtime.emit_step_end()`. This is intended for external orchestrators (e.g. WebBench) that already own step lifecycle and just want the SDK’s snapshot-first propose+execute block.

- `await agent.act_once(...) -> str`
- `await agent.act_once_with_snapshot(...) -> (action, snap)`
- `await agent.act_once_result(...) -> { action, snap, used_vision }`

### 2026-02-13

#### Expanded deterministic verifications (adaptive resnapshotting)
Expand Down
6 changes: 6 additions & 0 deletions examples/agent/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
Predicate agent examples.

- `predicate_browser_agent_minimal.py`: minimal `PredicateBrowserAgent` usage.
- `predicate_browser_agent_custom_prompt.py`: customize the compact prompt builder.
- `predicate_browser_agent_video_recording_playwright.py`: enable Playwright video recording via context options (recommended).

117 changes: 117 additions & 0 deletions examples/agent/predicate_browser_agent_custom_prompt.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
"""
Example: PredicateBrowserAgent with compact prompt customization.

This shows how to override the compact prompt used for action proposal.

Usage:
python examples/agent/predicate_browser_agent_custom_prompt.py
"""

import asyncio
import os

from predicate import AsyncSentienceBrowser, PredicateBrowserAgent, PredicateBrowserAgentConfig
from predicate.agent_runtime import AgentRuntime
from predicate.llm_provider import LLMProvider, LLMResponse
from predicate.models import Snapshot
from predicate.runtime_agent import RuntimeStep
from predicate.tracing import JsonlTraceSink, Tracer


class RecordingProvider(LLMProvider):
"""
Example provider that records the prompts it receives.

Swap this for OpenAIProvider / AnthropicProvider / DeepInfraProvider / LocalLLMProvider in real usage.
"""

def __init__(self, action: str = "FINISH()"):
super().__init__(model="recording-provider")
self._action = action
self.last_system: str | None = None
self.last_user: str | None = None

def generate(self, system_prompt: str, user_prompt: str, **kwargs) -> LLMResponse:
_ = kwargs
self.last_system = system_prompt
self.last_user = user_prompt
return LLMResponse(content=self._action, model_name=self.model_name)

def supports_json_mode(self) -> bool:
return False

@property
def model_name(self) -> str:
return "recording-provider"


def compact_prompt_builder(
task_goal: str,
step_goal: str,
dom_context: str,
snap: Snapshot,
history_summary: str,
) -> tuple[str, str]:
_ = snap
system = (
"You are a web automation executor.\n"
"Return ONLY ONE action in this format:\n"
"- CLICK(id)\n"
'- TYPE(id, "text")\n'
"- PRESS('key')\n"
"- FINISH()\n"
"No prose."
)
# Optional: aggressively control token usage by truncating DOM context.
dom_context = dom_context[:4000]
user = (
f"TASK GOAL:\n{task_goal}\n\n"
+ (f"RECENT STEPS:\n{history_summary}\n\n" if history_summary else "")
+ f"STEP GOAL:\n{step_goal}\n\n"
f"DOM CONTEXT:\n{dom_context}\n"
)
return system, user


async def main() -> None:
run_id = "predicate-browser-agent-custom-prompt"
tracer = Tracer(run_id=run_id, sink=JsonlTraceSink(f"traces/{run_id}.jsonl"))

api_key = os.environ.get("PREDICATE_API_KEY") or os.environ.get("SENTIENCE_API_KEY")

async with AsyncSentienceBrowser(api_key=api_key, headless=False) as browser:
page = await browser.new_page()
await page.goto("https://example.com")
await page.wait_for_load_state("networkidle")

runtime = await AgentRuntime.from_sentience_browser(
browser=browser, page=page, tracer=tracer
)

executor = RecordingProvider(action="FINISH()")

agent = PredicateBrowserAgent(
runtime=runtime,
executor=executor,
config=PredicateBrowserAgentConfig(
history_last_n=2,
compact_prompt_builder=compact_prompt_builder,
),
)

out = await agent.step(
task_goal="Open example.com",
step=RuntimeStep(goal="Take no action; just finish"),
)
print(f"step ok: {out.ok}")
print("--- prompt preview (system) ---")
print((executor.last_system or "")[:300])
print("--- prompt preview (user) ---")
print((executor.last_user or "")[:300])

tracer.close()


if __name__ == "__main__":
asyncio.run(main())

101 changes: 101 additions & 0 deletions examples/agent/predicate_browser_agent_minimal.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,101 @@
"""
Example: PredicateBrowserAgent minimal demo.

PredicateBrowserAgent is a higher-level, browser-use-like wrapper over:
AgentRuntime + RuntimeAgent (snapshot-first action proposal + execution + verification).

Usage:
python examples/agent/predicate_browser_agent_minimal.py
"""

import asyncio
import os

from predicate import AsyncSentienceBrowser, PredicateBrowserAgent, PredicateBrowserAgentConfig
from predicate.agent_runtime import AgentRuntime
from predicate.llm_provider import LLMProvider, LLMResponse
from predicate.runtime_agent import RuntimeStep, StepVerification
from predicate.tracing import JsonlTraceSink, Tracer
from predicate.verification import exists, url_contains


class FixedActionProvider(LLMProvider):
"""Tiny in-process provider for examples/tests."""

def __init__(self, action: str):
super().__init__(model="fixed-action")
self._action = action

def generate(self, system_prompt: str, user_prompt: str, **kwargs) -> LLMResponse:
_ = system_prompt, user_prompt, kwargs
return LLMResponse(content=self._action, model_name=self.model_name)

def supports_json_mode(self) -> bool:
return False

@property
def model_name(self) -> str:
return "fixed-action"


async def main() -> None:
run_id = "predicate-browser-agent-minimal"
tracer = Tracer(run_id=run_id, sink=JsonlTraceSink(f"traces/{run_id}.jsonl"))

api_key = os.environ.get("PREDICATE_API_KEY") or os.environ.get("SENTIENCE_API_KEY")

async with AsyncSentienceBrowser(api_key=api_key, headless=False) as browser:
page = await browser.new_page()
await page.goto("https://example.com")
await page.wait_for_load_state("networkidle")

runtime = await AgentRuntime.from_sentience_browser(
browser=browser, page=page, tracer=tracer
)

# For a "real" run, swap this for OpenAIProvider / AnthropicProvider / DeepInfraProvider / LocalLLMProvider.
executor = FixedActionProvider("FINISH()")

agent = PredicateBrowserAgent(
runtime=runtime,
executor=executor,
config=PredicateBrowserAgentConfig(
# Keep a tiny, bounded LLM-facing step history (0 disables history entirely).
history_last_n=2,
),
)

steps = [
RuntimeStep(
goal="Verify Example Domain is loaded",
verifications=[
StepVerification(
predicate=url_contains("example.com"),
label="url_contains_example",
required=True,
eventually=True,
timeout_s=5.0,
),
StepVerification(
predicate=exists("role=heading"),
label="has_heading",
required=True,
eventually=True,
timeout_s=5.0,
),
],
max_snapshot_attempts=2,
snapshot_limit_base=60,
)
]

ok = await agent.run(task_goal="Open example.com and verify", steps=steps)
print(f"run ok: {ok}")

tracer.close()
print(f"trace written to traces/{run_id}.jsonl")


if __name__ == "__main__":
asyncio.run(main())

Loading
Loading