Browser-controlling AI agents: sandbox and web permissions to enforce

Browser-controlling agents — Claude Computer Use, OpenAI Operator, Anthropic Skills, browser-use, Steel — became in months one of the most visible autonomous-agent use cases. Book a restaurant, fill a form, shop online, navigate an admin UI. Useful, massive attack surface: the browser is one of the most complex and hostile-content-exposed environments that exist.

Here's the minimum defensive posture to ship a browser agent without making it an open door.

Why browsers are special

Three traits make browser agents riskier than HTTP-tool agents:

1. The whole web is attacker-controlled

A page can contain HTML, JavaScript, invisible content, nested frames, popups, dynamically-modified content. The agent interprets visually (via vision-model screenshot) or structurally (via DOM). Both are manipulable.

2. The browser aggregates contexts

A typical browser has cookies, active sessions, history, sometimes stored passwords, sometimes extensions with their own permissions. An agent taking control inherits all of it.

3. Actions are high-impact by default

Clicking "send", "confirm", "delete", "pay" on websites are high-impact actions without additional confirmation. A hallucination or prompt injection translates to immediate external action.

Defensive posture in 6 points

1. Browser isolated per session

The agent never runs in the user's personal browser. It runs in:

A container with dedicated Chromium / Chrome.
Empty state per session (no persistent cookies, no history, no extension).
Temporary profile, deleted at session end.
Ideally an ephemeral Docker container with dedicated user and disposable home.

The reflex Anthropic Computer Use materializes by default (Docker container). Many custom implementations don't.

2. Federated, ephemeral site identities

For sites where the agent must log in:

No user password stored.
OAuth / SAML / OIDC with minimal scope.
Ephemeral token retrieved at session start, destroyed at end.
No "remember me" checked.

For sites without OAuth, use an external credential manager (1Password CLI, Bitwarden) that doesn't store the password in the agent browser.

3. URL allowlist

The agent visits only URLs from an explicit allowlist. No free web browsing. If the user says "search X on Google and click the most relevant result", the agent must:

Either refuse (strict mode).
Or ask user confirmation before leaving the starting domain.
Or browse in an "exploration" sub-mode where no sensitive capability is active.

Without allowlist, any web page becomes a potential prompt-injection channel with agent capabilities.

4. Human confirmation on irreversible actions

Before clicking:

A "send" / "publish" / "confirm" button.
A payment button.
A "delete" / "close account" button.

…the agent must screenshot, show the user, and wait for confirmation. Friction, necessary. Without it, any hijacking (via malicious popup, DOM-injected content) translates immediately to action.

5. Full-journey logging

All URLs visited.
All screenshots taken.
All actions (click, scroll, type, navigate).
Target elements (CSS selector, position, content).

Immutable logs kept 90+ days. Essential to reconstruct an incident.

6. Visible kill-switch

The user must be able to stop the agent at any moment, ideally with a keyboard shortcut or always-visible button. The kill-switch must:

Immediately stop the browser.
Freeze state for analysis.
Not erase logs.

Attacks to anticipate

Injection via popup or notification

A legitimate site can show a popup whose text looks like a user instruction. The agent can follow. Defense: never interpret post-load text as a user instruction.

Visual injection on vision models

A site can contain text in invisible background (white-on-white, opacity 0), tiny text, or text hidden in an image. A vision model can read and interpret as instructions. The visual counterpart of white-on-white injection in PDFs.

Defense: extract structured DOM via accessibility tree rather than screenshots when possible. Plus training/fine-tuning vision models to ignore non-human-visible content.

Reverse phishing

An attacker makes the agent visit a fake site resembling the real (fake PayPal). The agent enters credentials. Defense: strict TLS cert verification, domain allowlist, refuse non-allowlisted URLs for sensitive credentials.

Cross-site session hijacking via persistent cookies

If the agent keeps cookies across sessions, a malicious site can read them. Defense: empty profile per session (point 1).

Minimum deliverable before production

For a browser agent acting on the IS:

Documented architecture (container, isolation, identity).
List of allowed sites.
List of high-impact actions requiring confirmation.
Tested kill-switch procedure.
Operational and accessible audit log.
Adversarial tests passed (at least the 5 red-team scenarios).

Without these, browser agents shouldn't be in prod on stakeful use cases.

A related topic on your side?

20 minutes to scope it together. No commercial pitch.

Book a Calendly call →