Ch 14: Production Hardening & Security Operations

Ch 14 — Production Hardening & Security Operations

Observability, drift detection, CI/CD security gates, incident response, AIID

Index Under the Hood →

High Level

input

Input

arrow_forward

shield

Guardrails

arrow_forward

smart_toy

Model

arrow_forward

build

Tools

arrow_forward

filter_alt

Output Filter

arrow_forward

receipt_long

Audit Log

Click play or press Space to begin the journey...

Step- / 7

monitoring

LLM Observability in Production

Tracing, evaluation, and monitoring for non-deterministic systems

Why LLM Monitoring Is Different

Traditional application monitoring tracks latency, error rates, and throughput. LLM observability adds dimensions that don’t exist in conventional systems:

Non-deterministic outputs: The same input can produce different outputs. You need quality scoring, not just uptime checks.
Token cost tracking: Identical HTTP requests vary dramatically in cost. A 10-token prompt and a 10,000-token prompt look the same to traditional monitors.
Safety monitoring: Every request must be scanned for prompt injection, PII leakage, hallucination, and jailbreaks — in real time.
Multi-step tracing: Agent workflows span multiple LLM calls, tool invocations, and RAG retrievals. You need distributed tracing across the full chain.

Core Capabilities

Performance & cost: Track latency, error rates, token usage, and cost anomalies per model, query, and client. Identify expensive prompts and inefficient API calls.

Safety & security: Deterministic and LLM-based detectors on every request. Audit trails for regulatory compliance.

Distributed tracing: Capture complete request lifecycles across agent workflows, RAG pipelines, and tool integrations. Isolate bottlenecks and enable root-cause analysis.

Tools: Elastic LLM Monitoring, MLflow AI Gateway (unified tracing + evaluation + cost tracking), Weights & Biases, and custom OpenTelemetry instrumentation. MLflow eliminates data silos by integrating gateway, tracing, and evaluation in one system.

trending_down

Model Drift & Quality Degradation

Data drift, concept drift, and training-serving skew

Types of Drift

Data drift: Input feature distributions change over time — population shifts, seasonal variations, upstream pipeline changes. Detected with KS tests and PSI (Population Stability Index).

Concept drift: The relationship between features and target variables changes — policy changes, market conditions, user behavior shifts. Detected through performance metrics and error rate monitoring.

Training-serving skew: Production feature distributions differ from training data. Detectable from day one if training data isn’t representative.

Label drift: Target variable distribution changes from annotation errors or labeling criteria shifts.

Detection Methods

Statistical: KL Divergence, KS Test, Wasserstein Distance, PSI
Data quality: Schema validation, cardinality changes, missing values
Performance: Error rates, F1/AUC-ROC, latency degradation
Business: Conversion rates, user engagement, A/B test results
Anomaly: Isolation Forest for outlier detection

# Drift monitoring tools # Google Vertex AI Model Monitoring # Built-in data drift + feature skew # etsi-watchdog (open-source) # Plug-in architecture for custom drift # algorithms, rolling window monitoring, # Slack alerting # Key metrics to track: PSI < 0.1 → No significant drift PSI 0.1-0.2 → Moderate drift, investigate PSI > 0.2 → Significant drift, retrain

Not all drift requires action: Implement severity-based alerting focused on drift that impacts performance or business outcomes. A feature distribution shift that doesn’t affect predictions is noise, not signal.

rule

CI/CD Security Gates for LLM Deployments

Automated quality gates, red teaming in pipelines, progressive rollout

Security Gates in the Pipeline

Every LLM deployment should pass through automated security gates before reaching production:

1. Eval gate: Unit-style tests for LLM outputs — relevancy, hallucination, toxicity, latency scoring. Fail the build if thresholds aren’t met.
2. Red team gate: Automated vulnerability scanning (Garak, DeepTeam from Ch 11) integrated into CI/CD. Block deployment on critical findings.
3. Compliance gate: Automated reports mapping to NIST AI RMF and ISO 42001 clauses (Ch 12).
4. Cost gate: Track token usage and API spending over time. Alert on cost anomalies.

2025 Baseline Thresholds

# CI/CD quality gates (signed off by # product, compliance, and SRE teams) Answer Relevancy ≥ 0.8 Context Precision ≥ 0.7 Toxicity ≤ 0.2 p95 Latency < 3s # Fail build if thresholds breached: Faithfulness < 0.8 → BLOCK DEPLOY Toxicity > 0.2 → BLOCK DEPLOY Critical vuln found → BLOCK DEPLOY

Progressive Rollout

Dark launch: Test with internal users first
Canary: Route 1–5% of traffic to new version
Gradual ramp: Increase traffic as metrics confirm safety
Instant rollback: Feature flags enable rollback without redeployment (LaunchDarkly AI Configs)

Tools: Promptfoo (CI/CD eval + red teaming), Deepeval (unit tests for LLM outputs), Guardrails AI (CI/CD integration), LaunchDarkly (progressive rollout with AI configs).

emergency

AI Incident Response

CoSAI Framework v1.0 — NIST-aligned playbooks for AI-specific incidents

Why AI Needs Dedicated IR

AI incidents surged 56.4% from 2023 to 2024 (233 documented cases). AI breaches take an average of 277 days to detect — significantly longer than traditional breaches. 83% of AI incidents involve data exposure.

Traditional IR playbooks fail because AI has unique failure modes: model drift, adversarial attacks, data poisoning, bias incidents, hallucinations, jailbreaking, and context poisoning. Detection requires behavioral analysis, not traditional indicators of compromise.

A Surprising Statistic

67% of AI incidents stem from model errors rather than adversarial attacks. Organizations disproportionately focus security budgets on external threats while the majority of incidents come from internal model failures — hallucinations, bias, drift, and misconfigurations.

CoSAI AI IR Framework v1.0 (Nov 2025)

The Coalition for Secure AI released the first dedicated AI Incident Response Framework, adapted from the NIST SP 800-61 lifecycle:

Preparation: AI-specific runbooks for jailbreaking, model extraction, prompt injection, context poisoning
Detection & Analysis: Behavioral analysis and performance monitoring (not just IoCs)
Containment, Eradication, Recovery: Specialized procedures for compromised AI systems — model rollback, prompt reversion, tool isolation
Post-Incident: Build organizational knowledge to improve long-term AI security posture

Released alongside CoSAI’s “Signing ML Artifacts” framework for supply chain security.

EU AI Act Article 62: Requires reporting of serious AI incidents. Your IR plan must include regulatory notification procedures. The CoSAI framework aligns with both NIST and EU AI Act requirements.

database

The AI Incident Database (AIID)

1,360+ documented incidents — learning from real-world AI failures

What AIID Is

The AI Incident Database (incidentdatabase.ai) is a comprehensive tracking system documenting real-world AI incidents and failures. Over 1,360 incident IDs as of early 2026, with new incidents continuously added. Run by the Partnership on AI, it serves as the collective memory of AI failures — so the industry doesn’t repeat them.

Major 2025 Incident Trends

Deepfake fraud: The largest recurring pattern — impersonation scams using synthetic media for investment fraud across multiple countries

Security vulnerabilities: Platform regressions, data exposures, guardrail fragility. ~17% of analyzed OpenClaw AI skills exhibited malicious behavior

State-directed threats: North Korean operatives used AI-generated resumes and chatbot-assisted techniques to infiltrate Western companies

Government misuse: DOGE used unvetted ChatGPT outputs to cancel ~1,477 NEH grants ($100M+ in funding cuts)

How to Use AIID

Threat modeling: Search for incidents relevant to your AI use case. If you’re building a chatbot, study chatbot incidents. If you’re deploying in healthcare, study healthcare AI failures.

Red team scenarios: Use real incidents as inspiration for red team exercises (Ch 11). Real-world attacks are more creative than synthetic test cases.

Board reporting: Concrete incident examples make AI risk tangible for non-technical stakeholders.

Post-incident learning: After your own incidents, compare with similar AIID entries to identify patterns you may have missed.

Mandatory reading: Review AIID quarterly roundups (published regularly at incidentdatabase.ai/blog). They categorize incidents by type, severity, and affected sector. This is the closest thing to a CVE database for AI failures.

hub

SOC Transformation for AI

Dynamic playbooks, behavioral analysis, and AI-aware detection

The SOC Challenge

Security Operations Centers built for traditional IT don’t know how to handle AI incidents. Analysts trained on network intrusion and malware don’t recognize prompt injection, model drift, or context poisoning. The detection gap is real: 277 days average detection time for AI breaches.

What Needs to Change

AI-specific detection rules: Monitor for anomalous token consumption, unusual prompt patterns, output quality degradation, and guardrail bypass attempts

Behavioral analysis: AI incidents require performance monitoring and behavioral baselines, not just IoC matching

MITRE ATLAS integration: Map AI threats to ATLAS TTPs (Ch 12) so analysts have a shared vocabulary for AI incidents

Dynamic playbooks: By mid-2027, IDC projects 85% of detection and response playbooks will be generated dynamically at alert time, adapting to specific context

AI-Specific Alert Categories

# SOC alert categories for AI systems CRITICAL: - Prompt injection detected (bypass) - PII in model output - Model extraction attempt - Guardrail bypass confirmed HIGH: - Anomalous token consumption (>3σ) - Output quality score drop >20% - Jailbreak attempt (blocked) - Unauthorized model access MEDIUM: - Data drift PSI > 0.2 - Latency p95 > SLA threshold - Rate limit exceeded - Shadow AI detected (AI-SPM)

Training gap: Your SOC analysts need AI security training. MITRE ATLAS, OWASP LLM Top 10, and the CoSAI IR framework should be part of analyst onboarding. An analyst who can’t recognize a prompt injection is as blind as one who can’t recognize a SQL injection.

check_circle

The Complete Production Security Lifecycle

From Chapters 1–14: everything connected

The Full Course in One Pipeline

Understand the threats (Ch 1–5): AI security landscape, prompt injection, jailbreaking, data poisoning, adversarial ML

Build the defenses (Ch 6–9): Guardrails, RAG security, agent sandboxing, MCP security

Protect the data (Ch 10): Privacy, PII detection, differential privacy, GDPR compliance

Test continuously (Ch 11): Red teaming with Garak, PyRIT, DeepTeam

Govern and comply (Ch 12): EU AI Act, NIST AI RMF, ISO 42001, MITRE ATLAS

Architect for security (Ch 13): Zero trust, LLM gateways, LLM firewalls, confidential computing, AI-SPM

Operate securely (Ch 14): Observability, drift detection, CI/CD gates, incident response, SOC transformation

Production Checklist

□ LLM gateway with 8-stage security pipeline
□ Token-based rate limiting per client
□ LLM firewall (LlamaFirewall or commercial)
□ Input/output guardrails (NeMo, LLM Guard)
□ PII scanning (Presidio) on both directions
□ Tool sandboxing (WASM/containers)
□ MCP server allowlisting and description pinning
□ Distributed tracing across full request lifecycle
□ Drift monitoring with severity-based alerting
□ CI/CD security gates (eval + red team + compliance)
□ Progressive rollout with instant rollback
□ AI incident response plan (CoSAI framework)
□ SOC training on MITRE ATLAS and OWASP LLM Top 10
□ AI-SPM for shadow AI discovery
□ AIID quarterly review for threat intelligence

This is the end of the high-level journey. Every chapter’s “Under the Hood” view contains code examples, architecture diagrams, and implementation patterns. AI security is not a destination — it’s a continuous practice. The threat landscape evolves daily. Build the foundations from this course, then keep learning.