AI Security

AI agent threat model: 7 attack surfaces to map before go-live

An agent that acts in your system has an attack surface no classic pentest covers. The 7 surfaces to map before shipping to production.

Aroua Biri

A chatbot answers. An agent acts. That single-letter difference produces an order-of-magnitude gap in attack surface. With a chatbot, the worst case is a bad answer. With an agent, the worst case is that an attacker executes code in your system, exfiltrates data or issues a payment.

Here are the 7 attack surfaces every production AI agent needs mapped before go-live. This is the minimum bar.

Surface 1 — Direct user input

The best-known one: what the user types in the prompt. Risks:

  • Direct prompt injection ("Ignore your previous instructions…").
  • Jailbreak to bypass alignment guardrails.
  • Privilege abuse: a user asks the agent to do something the user isn't allowed to do.

Defenses: intent classifier on input, strict separation of system instructions and data, permission checks at every tool call (not just at login).

Surface 2 — Contextual sources (RAG, files, memory)

This is indirect prompt injection. A document indexed in your RAG contains hidden instructions that hijack the agent when it reads them. Real vectors:

  • PDF with invisible text (white-on-white, zero-size).
  • Web page parsed by the agent.
  • Email read and summarized by an assistant.
  • Output of a third-party tool that an attacker can influence.

The most dangerous surface in 2026 because it bypasses end-user vigilance.

Surface 3 — The tools the agent can call

Each exposed tool is a capability. The rule is simple: a tool that can do something dangerous will eventually be used for something dangerous. Not if, when.

Map for every tool:

  • What can it write / delete / publish?
  • What's the access scope (one repo, all repos, prod, sandbox)?
  • Is it reversible?
  • Does it log who called it and with which parameters?

Surface 4 — The other agents

In multi-agent systems (LangGraph, AutoGen, CrewAI), a compromised agent can propagate the compromise to its neighbors via:

  • Messages it sends them (inter-agent prompt injection).
  • Shared memory.
  • Outputs of one agent that become inputs of another.

Surface 5 — Long-term memory

Most 2026 agents have persistent memory (vector store, relational DB, markdown file). That memory is:

  • Writable by the agent itself, so by anyone who controls the agent.
  • Read by all future invocations, so capable of poisoning the agent's behavior over time.

Underrated risk.

Surface 6 — The models themselves

The agent calls one or more LLMs. Those models can be:

  • Backdoored at training time (dataset poisoning, malicious fine-tuning).
  • Manipulated at runtime via transitive jailbreak when one agent calls another.
  • Substituted if model routing isn't verified (provider compromise, MITM).

Defense: weight signing, integrity verification, statistical behavior monitoring.

Surface 7 — The agent infrastructure

Often forgotten because it doesn't seem AI-specific. Yet:

  • The runtime that runs the agent (containers, lambdas, VMs).
  • The storage of prompts, logs, conversations.
  • The network between the agent and its tools and LLMs.
  • The CI/CD that ships agent versions and prompts.

All classic DevSecOps weaknesses apply here, with a multiplier: a compromise here lets an attacker modify the system prompt invisibly.

The threat modeling grid to use

For each surface, three questions:

  1. Who can write into this surface? (end user, internal employee, external attacker, another agent)
  2. What does the agent do with it? (read, parse, execute, forward)
  3. What privilege level is mobilized next? (data read, data write, external action)

If at the intersection you find: an external attacker can write; the agent will execute; the action has high privilege — that's a critical path. Ideally you have zero. Often you have several on first audit.

Minimum deliverables before production

  • A DFD showing the 7 surfaces.
  • A tool / privilege / reversibility matrix.
  • A documented, tested kill-switch.
  • Audit log of each tool call (who, when, with what parameters, result).
  • A compromised-agent incident plan.

Without those five, going to production is premature.

A related topic on your side?

20 minutes to scope it together. No commercial pitch.

Book a Calendly call