summarize

Key Insights — AI Security

A high-level summary of the core concepts across all 14 chapters.
Attack Surface
Injection, Jailbreaks & Poisoning
Chapters 1-5
expand_more
1
AI introduces entirely new attack vectors where inputs are natural language, outputs are probabilistic, and the blast radius is expanding.
  • OWASP Top 10 for LLMs: The definitive list of AI vulnerabilities, with Prompt Injection (LLM01) at the top.
  • MITRE ATLAS: The ATT&CK framework adapted for AI, mapping adversarial tactics like evasion, poisoning, and extraction.
2
The fundamental flaw of LLMs: they cannot reliably distinguish between developer instructions and user inputs.
  • Direct Injection: A user intentionally types malicious commands to override the system prompt or extract hidden instructions.
  • Indirect Injection: Malicious instructions are hidden in external data (like a webpage or email) that the LLM retrieves and processes.
3
Bypassing a model's safety alignment to elicit prohibited, harmful, or toxic outputs.
  • Techniques: Attackers use role-play (e.g., "DAN"), hypothetical scenarios, or encoded payloads (Base64) to trick the model into ignoring its safety training.
  • Many-Shot Jailbreaks: Overwhelming the model's context window with hundreds of fake "successful" malicious interactions to normalize bad behavior.
4
Corrupting the model at its source by manipulating training data or compromising dependencies.
  • Sleeper Agents: Models trained to behave normally until a specific trigger word is present, at which point they execute malicious behavior.
  • Pickle Exploits: Malicious models hosted on platforms like Hugging Face that execute arbitrary code when downloaded and deserialized.
5
Mathematical attacks that exploit the high-dimensional geometry of neural networks.
  • Adversarial Examples: Adding invisible noise to an image (e.g., a stop sign) that causes a computer vision model to misclassify it with high confidence.
The Bottom Line: Because LLMs parse instructions and data in the same stream, perfect prevention of prompt injection is currently considered mathematically impossible. Defense requires a layered approach.
Defense
Guardrails, RAG & Agents
Chapters 6-9
expand_more
6
The first line of defense: intercepting malicious inputs and sanitizing harmful outputs.
  • Input/Output Filtering: Using smaller, specialized models (like Llama Guard) to scan prompts for injections and responses for PII or toxicity.
  • Semantic Routing: Directing safe queries to the main LLM and blocking or redirecting unsafe queries before they cost compute.
7
Retrieval-Augmented Generation turns your internal documents into an attack surface.
  • RAG Poisoning: Attackers insert malicious instructions into documents (like resumes or public wikis) that they know the RAG system will ingest.
  • Access Control: Ensure the LLM only retrieves documents the current user actually has permission to view.
8
When AI can take actions (API calls, code execution), the blast radius of an attack expands exponentially.
  • Excessive Agency (LLM06): Giving an agent broad permissions (e.g., full database write access) instead of scoping tools to least privilege.
  • Human-in-the-Loop: Requiring explicit user approval before an agent executes destructive or high-stakes actions.
9
The Model Context Protocol standardizes tool use, but standardizes the attack vectors along with it.
  • Tool Poisoning: Hiding malicious instructions inside the schema definitions of MCP tools.
  • Sandboxing: Running agent code execution environments in isolated containers (like Docker or WASM) to prevent system compromise.
The Bottom Line: Defense in depth is mandatory. You must assume the LLM will eventually be compromised by a prompt injection, and build sandboxes and guardrails to contain the damage.
Risk
Privacy, Red Teaming & Compliance
Chapters 10-12
expand_more
10
Models memorize their training data, making them vulnerable to privacy leaks.
  • Membership Inference: Attacks that determine if a specific person's data was used in the training set.
  • Model Extraction: Stealing a proprietary model by querying it millions of times and using the outputs to train a clone.
11
Proactively attacking your own AI systems to find vulnerabilities before deployment.
  • Automated Red Teaming: Using specialized LLMs to generate thousands of adversarial prompts to stress-test your application (e.g., using tools like PromptFoo or Garak).
  • Continuous Evaluation: Red teaming is not a one-time audit; it must be integrated into the CI/CD pipeline as models and attacks evolve.
12
Navigating the rapidly evolving legal and regulatory landscape for AI.
  • EU AI Act: The first comprehensive legal framework classifying AI systems by risk level, with heavy fines for non-compliance.
  • NIST AI RMF: The US framework for managing AI risk, focusing on mapping, measuring, and managing potential harms.
The Bottom Line: Security testing for AI must be automated and continuous. Relying on manual penetration testing is insufficient against the scale and creativity of LLM-based attacks.
Architecture
Hardening Production Systems
Chapters 13-14
expand_more
13
Designing systems that remain secure even when the core LLM is compromised.
  • Zero-Trust AI: Never trust the output of an LLM. Treat it as untrusted user input before passing it to databases or APIs.
  • API Gateways: Centralizing rate limiting, authentication, and guardrail execution to protect backend models from unbounded consumption (LLM10).
14
Implementing robust incident response and monitoring for AI applications.
  • Observability: Logging full prompt/response pairs, tool calls, and latency to detect anomalous behavior or slow-burn data poisoning.
  • Incident Response: Having a clear playbook for when an AI goes rogue, including "kill switches" to instantly degrade to safe fallbacks.
The Bottom Line: Treat the LLM as a potentially hostile actor inside your network. Isolate it, monitor it, and strictly limit what it can execute.