What It Is
An LLM gateway sits between applications and LLM providers, enforcing security controls at a single chokepoint. It mirrors provider endpoints (e.g., OpenAI’s /v1/chat/completions), so applications redirect traffic by changing only the base URL. All requests and responses pass through the gateway’s security pipeline.
8-Stage Security Pipeline
1. Authentication: Per-client API keys with 256-bit entropy
2. Rate limiting: Token-based (not request-based) with sliding windows
3. Model allowlist: Restrict which models clients can access
4. Prompt injection detection: 20+ regex patterns with cumulative risk scoring
5. PII scanning: SSN, credit cards, emails, phones — redact or block
6. Response scanning: Same injection/PII checks on LLM output
7. Provider routing: Load balance across OpenAI, Bedrock, etc.
8. Audit logging: Structured JSON with latency, correlation IDs
# LLM Gateway: conceptual architecture
# Application code — only change base URL
client = OpenAI(
base_url="https://llm-gateway.internal",
api_key="client-specific-key"
)
# Gateway pipeline (transparent to app):
# ┌─────────────────────────────┐
# │ 1. Authenticate client key │
# │ 2. Check token rate limit │
# │ 3. Verify model allowlist │
# │ 4. Scan for prompt injection│
# │ 5. Scan/redact PII │
# │ 6. Forward to provider │
# │ 7. Scan response │
# │ 8. Log everything │
# └─────────────────────────────┘
Streaming challenge: Streaming responses complicate security — full response scanning for PII conflicts with real-time token delivery. Advanced gateways buffer selectively or use streaming-compatible scanners.