TL;DR — the essentials in 5 points
- AI security ≠ traditional cybersecurity: new vectors (prompt injection, jailbreak, agent hijacking) that don't appear in traditional pentests.
- EU AI Act application on August 2, 2026 for high-risk systems (Annex III). 7 technical pillars to put into production.
- Project Glasswing and Claude Mythos redefine expectations: if Anthropic can find 83% of zero-days, your defenses must be deeper.
- Defensive 2026 stack: input filtering + Constitutional AI + output filtering + sandbox + forensic logs + continuous red teaming.
- Combined compliance: EU AI Act + ISO 42001 + NIST AI RMF + SOC 2/27001 — pooled approach.
The 2026 context
AI security has become in 2025-2026 the #1 topic of application cybersecurity. Three factors converge:
Massive production adoption. More than 70% of European B2B SaaS now integrate at least one LLM in production (OpenAI, Anthropic, Mistral, Bedrock, Vertex AI). What was R&D in 2023 is business-critical in 2026.
Attack maturity. Prompt injection, jailbreak and exfiltration techniques are public, documented and automatable (Garak, PyRIT, Promptfoo). Attackers no longer have to invent — they apply recipes.
Regulatory pressure. The EU AI Act imposes precise governance and technical controls on high-risk AI systems, applicable on August 2, 2026. Compliance is not prepared in a few weeks.
The 5 threat families specific to LLMs
1. Prompt injection (direct and indirect)
The LLM doesn't distinguish developer instructions from user data. Any data entering the context can be interpreted as an instruction.
- Direct: "ignore your previous instructions...", visible and manageable by classification.
- Indirect: instructions hidden in documents, web pages, emails read by the LLM. Vector of the Slack AI flaw and Microsoft Copilot oversharing.
2. Cross-tenant data leakage and exfiltration
The model can reveal memorized training data, other users' context, environment secrets. Particularly critical in multi-tenant.
3. Guardrail jailbreak
Bypassing model protections: roleplay, encoding (base64, ROT), multi-turn. Classic jailbreaks (DAN, etc.) are now largely neutralized by modern models — newer ones exploit more subtle strategies.
4. Model manipulation and data poisoning
If you fine-tune or operate a RAG, your training and ingestion pipeline is a target.
5. Agent hijacking
An agent capable of calling tools (read files, execute code, send email) is a particularly attractive target. An injected instruction transforms the agent into a remotely controlled tool. Dominant pattern in 2026.
The typical defensive stack
For a serious B2B SaaS in 2026, securing an LLM system layers 6 defenses:
- Pre-filter classifier — detects obvious prompt injection.
- LLM with Constitutional AI — resists subtle attacks.
- Post-filter classifier — blocks PII leak, malicious links.
- Execution sandbox — for tools manipulating data.
- Forensic logs + alerting.
- Continuous red teaming with Garak/PyRIT in CI/CD.
Each layer alone is insufficient. Stacked, they produce effective defense in depth.
EU AI Act compliance — 7 technical pillars
For high-risk AI systems (Annex III), 7 obligations in production before August 2, 2026:
- Risk management system documented and operational (Article 9).
- Data quality: datasheets, lineage, biases (Article 10).
- Living technical documentation (Annex IV).
- Logging and decision traceability.
- User transparency (Article 13).
- Human oversight (Article 14).
- Robustness, accuracy, cybersecurity (Article 15).
Compliance with multiple frameworks
The combined approach EU AI Act + ISO 42001 + NIST AI RMF + ISO 27001 / SOC 2 reuses 60-70% of common controls. Single, pooled approach.