Ch 6 — Input Guardrails & Output Filtering — Under the Hood
NeMo Guardrails, LLM Guard, Lakera, OpenAI Moderation, canary tokens, LLM-as-Judge
Under the Hood
-
Click play or press Space to begin. Click any node for deep-dive details...
AInput Guardrails: Scanning Before the LLMOWASP LLM05:2025 — Improper Output Handling starts with input
1person
User PromptRaw untrusted
input arrives
policy
NeMo GuardrailsNVIDIA Colang
programmable rails
2shield
LLM GuardProtect AI
scanner suite
radar
Lakera GuardAPI-based
injection detection
3arrow_downward Model-level: OpenAI Moderation & Instruction Hierarchy
BModel-Level Safety ControlsOpenAI Moderation API, Instruction Hierarchy (Apr 2024)
gavel
Moderation APIOpenAI content
classifier
4priority_high
Instruction HierarchySystem > user > tool
priority enforcement
smart_toy
LLM ProcessesConstrained by
hierarchy rules
5arrow_downward Output side: filtering, PII detection, canary tokens
COutput Filtering & PII DetectionGuardrails AI, canary tokens (OWASP LLM07), PII scrubbing
output
Raw LLM OutputUnfiltered model
response
6badge
PII ScannerDetect & redact
sensitive data
key
Canary TokensDetect system
prompt leakage
7arrow_downward LLM-as-Judge: second model evaluates output quality
DLLM-as-Judge & Guardrails AISecond-model evaluation, RAIL spec, structured output validation
balance
LLM-as-JudgeSecond model
evaluates safety
8rule
Guardrails AIRAIL spec
output validation
check_circle
Validated OutputPasses all
safety checks
9arrow_downward Layered defense: combining all guardrail layers
ELayered Defense ArchitectureCombining input, model, and output guardrails
compare_arrows
Tool ComparisonNeMo vs LLM Guard
vs Lakera vs OMAI
10layers
Defense StackFull guardrail
architecture