Input Validation Layers
1. Length limits: Cap input length to prevent context window abuse and cost spikes. Typical: 2,000–4,000 characters for chat
2. Language detection: Reject or route inputs in unsupported languages
3. Content classification: Detect and block toxic, violent, sexual, or self-harm content before it reaches the model
4. Topic restriction: Ensure inputs are within the system’s intended scope. A customer service bot shouldn’t answer medical questions
5. Rate limiting: Prevent abuse by limiting requests per user per minute
Implementation Approaches
• Rule-based: Regex patterns, keyword blocklists, length checks. Fast (sub-ms), free, but brittle. Good for obvious cases
• Classifier-based: Trained models that classify inputs as safe/unsafe. More robust than rules, ~10ms latency. OpenAI Moderation API, Perspective API
• LLM-based: Ask a fast model (GPT-4o-mini) to classify the input. Most flexible but adds 200–500ms latency and cost
Key insight: Layer your input guardrails from cheapest to most expensive. Rule-based checks first (free, instant), then classifiers (cheap, fast), then LLM-based only for ambiguous cases. This keeps latency low and catches 95% of issues with the cheap layers.