A chatbot answers. An agent acts. That single-letter difference produces an order-of-magnitude gap in attack surface. With a chatbot, the worst case is a bad answer. With an agent, the worst case is that an attacker executes code in your system, exfiltrates data or issues a payment.
Here are the 7 attack surfaces every production AI agent needs mapped before go-live. This is the minimum bar.
Surface 1 — Direct user input
The best-known one: what the user types in the prompt. Risks:
- Direct prompt injection ("Ignore your previous instructions…").
- Jailbreak to bypass alignment guardrails.
- Privilege abuse: a user asks the agent to do something the user isn't allowed to do.
Defenses: intent classifier on input, strict separation of system instructions and data, permission checks at every tool call (not just at login).
Surface 2 — Contextual sources (RAG, files, memory)
This is indirect prompt injection. A document indexed in your RAG contains hidden instructions that hijack the agent when it reads them. Real vectors:
- PDF with invisible text (white-on-white, zero-size).
- Web page parsed by the agent.
- Email read and summarized by an assistant.
- Output of a third-party tool that an attacker can influence.
The most dangerous surface in 2026 because it bypasses end-user vigilance.
Surface 3 — The tools the agent can call
Each exposed tool is a capability. The rule is simple: a tool that can do something dangerous will eventually be used for something dangerous. Not if, when.
Map for every tool:
- What can it write / delete / publish?
- What's the access scope (one repo, all repos, prod, sandbox)?
- Is it reversible?
- Does it log who called it and with which parameters?
Surface 4 — The other agents
In multi-agent systems (LangGraph, AutoGen, CrewAI), a compromised agent can propagate the compromise to its neighbors via:
- Messages it sends them (inter-agent prompt injection).
- Shared memory.
- Outputs of one agent that become inputs of another.
Surface 5 — Long-term memory
Most 2026 agents have persistent memory (vector store, relational DB, markdown file). That memory is:
- Writable by the agent itself, so by anyone who controls the agent.
- Read by all future invocations, so capable of poisoning the agent's behavior over time.
Underrated risk.
Surface 6 — The models themselves
The agent calls one or more LLMs. Those models can be:
- Backdoored at training time (dataset poisoning, malicious fine-tuning).
- Manipulated at runtime via transitive jailbreak when one agent calls another.
- Substituted if model routing isn't verified (provider compromise, MITM).
Defense: weight signing, integrity verification, statistical behavior monitoring.
Surface 7 — The agent infrastructure
Often forgotten because it doesn't seem AI-specific. Yet:
- The runtime that runs the agent (containers, lambdas, VMs).
- The storage of prompts, logs, conversations.
- The network between the agent and its tools and LLMs.
- The CI/CD that ships agent versions and prompts.
All classic DevSecOps weaknesses apply here, with a multiplier: a compromise here lets an attacker modify the system prompt invisibly.
The threat modeling grid to use
For each surface, three questions:
- Who can write into this surface? (end user, internal employee, external attacker, another agent)
- What does the agent do with it? (read, parse, execute, forward)
- What privilege level is mobilized next? (data read, data write, external action)
If at the intersection you find: an external attacker can write; the agent will execute; the action has high privilege — that's a critical path. Ideally you have zero. Often you have several on first audit.
Minimum deliverables before production
- A DFD showing the 7 surfaces.
- A tool / privilege / reversibility matrix.
- A documented, tested kill-switch.
- Audit log of each tool call (who, when, with what parameters, result).
- A compromised-agent incident plan.
Without those five, going to production is premature.