Ollama JSON Mode
import ollama, json
response = ollama.chat(
model='qwen2.5:7b',
messages=[{
'role': 'user',
'content': '''Extract from this email:
- sender_name
- subject
- urgency (low/medium/high)
- action_required (true/false)
Email: "Hi team, the production
server is down. Need immediate
fix. - Sarah"
Return JSON only.'''
}],
format='json'
)
data = json.loads(response['message']['content'])
# {"sender_name": "Sarah",
# "subject": "production server down",
# "urgency": "high",
# "action_required": true}
Grammar-Constrained Output
For even more reliable structured output, llama.cpp supports GBNF grammars — formal grammar rules that constrain the model’s output to valid JSON, specific schemas, or any defined format. The model physically cannot produce invalid output.
Ollama supports this via the format parameter with a JSON schema, ensuring the output always matches your expected structure.
Key insight: Structured output is what turns a chatbot into a data pipeline. Extract entities from emails, classify tickets, parse invoices — all locally, all returning clean JSON. Combined with RAG, you can build complete document processing systems that run entirely on your hardware.