The Scenario
Your product team receives 200+ feedback emails weekly. You need to extract structured data from each one and load it into a PostgreSQL table for your analytics dashboard. The table schema has: sentiment, topic, urgency, suggested_action.
Production Prompt
Extract structured feedback data from this
customer email. Return ONLY valid JSON.
Schema:
{
"sentiment": "positive"|"negative"|"neutral",
"topic": "bug"|"feature_request"|
"ux"|"performance"|"billing"|"other",
"urgency": "low"|"medium"|"high",
"summary": string (max 50 words),
"suggested_action": "fix_bug"|
"add_to_roadmap"|"improve_docs"|
"escalate"|"no_action"
}
Rules:
- Use ONLY the enum values listed above
- If unsure about a field, use the most
conservative option
- urgency=high only if the user mentions
data loss, security, or blocking issue
Email:
"The new dashboard is gorgeous, love the
dark mode! But the export to CSV button
has been broken for 3 days now and our
weekly client report is due Friday. We
really need this fixed ASAP."
Model Output
{
"sentiment": "negative",
"topic": "bug",
"urgency": "high",
"summary": "CSV export broken for 3 days,
blocking weekly client report due
Friday. User likes new dashboard
and dark mode.",
"suggested_action": "fix_bug"
}
Why This Works at Scale
1. Enum constraints: Every field has a fixed set of valid values. Your code can validate with a simple if value in allowed_values check.
2. Decision rules in the prompt: “urgency=high only if...” prevents over-escalation. Without this rule, the model would mark everything “ASAP” as high.
3. Summary length cap: “max 50 words” keeps the summary concise and consistent across all emails.
4. Conservative default: “If unsure, use the most conservative option” prevents false positives in your triage system.
Key insight: For ETL pipelines, your prompt is essentially a data transformation specification. Treat it like a database schema: define every field, constrain every value, and specify the edge-case behavior. The more constraints, the more reliable the extraction.