Ch 11: Multi-Turn & Conversation Design — Prompt Engineering Mastery

Ch 11 — Multi-Turn & Conversation Design

Managing context across turns, preventing drift, and steering conversations toward useful outcomes

arrow_backIndex

Real-World

psychology

How Turns Work

arrow_forward

warning

Context Drift

arrow_forward

anchor

Anchoring

arrow_forward

summarize

Summaries

arrow_forward

storage

DB Schema

arrow_forward

undo

Course Correct

arrow_forward

memory

Memory Mgmt

arrow_forward

checklist

Playbook

Click play or press Space to begin...

Step- / 8

psychology

How Multi-Turn Conversations Actually Work

Every turn is re-sent to the model — there is no “memory,” only a growing context window

The Illusion of Memory

When you chat with an LLM, it feels like a conversation with memory. But here’s what actually happens on every turn:

1. Your entire conversation history (system prompt + all previous messages) is sent to the model
2. The model generates a response based on the full context
3. That response is appended to the history
4. On the next turn, the whole thing is sent again

The model has no persistent memory. It re-reads the entire conversation from scratch every time. This has profound implications for how you design multi-turn interactions.

What This Means in Practice

Context grows linearly: Each turn adds ~100–500 tokens. A 20-turn conversation can be 5,000–10,000 tokens of history. That’s expensive and eventually hits the context window limit.

Earlier turns fade: Due to the “lost in the middle” effect (Ch 10), the model pays less attention to turns 5–15 in a 20-turn conversation. Early decisions get “forgotten.”

No session persistence: Close the tab, lose the conversation. The model doesn’t remember you from yesterday.

API View

messages = [ {"role": "system", "content": "..."}, {"role": "user", "content": "Turn 1"}, {"role": "assistant", "content": "..."}, {"role": "user", "content": "Turn 2"}, {"role": "assistant", "content": "..."}, # ALL of this is sent every turn ]

Key insight: Multi-turn conversations are not chats — they’re growing documents. Every turn adds to the document. The model reads the entire document every time. Design your turns like you’re writing a collaborative document, not having a casual conversation.

warning

Context Drift: When Conversations Go Off the Rails

A 5-turn conversation where the model gradually loses track of the original requirements

The Drifting Conversation

Turn 1 (User): "Design a REST API for a todo app with users, projects, and tasks." Turn 1 (Model): [Designs 8 endpoints, good structure] Turn 2 (User): "Add authentication." Turn 2 (Model): [Adds JWT auth, modifies endpoints] Turn 3 (User): "What about rate limiting?" Turn 3 (Model): [Discusses rate limiting in general, forgets the specific API design] Turn 4 (User): "And caching?" Turn 4 (Model): [Generic caching advice, completely lost the original API context] Turn 5 (User): "OK, show me the final API spec." Turn 5 (Model): [Produces a spec that's missing the auth from Turn 2 and doesn't include rate limiting or caching properly]

Why This Happens

1. Vague follow-ups: “What about rate limiting?” doesn’t reference the API. The model treats it as a standalone question.

2. No anchoring: There’s no system prompt or summary reminding the model what they’re building.

3. Topic hopping: Each turn introduces a new topic without connecting it to the previous work.

4. No checkpoints: Nobody said “here’s what we have so far” to consolidate progress.

The pattern: Context drift is the #1 multi-turn failure mode. The conversation starts focused, but each vague follow-up loosens the model’s grip on the original task. By turn 5, the model is responding to the latest message, not the overall project.

anchor

Anchoring: Keep the Model on Track

A system prompt that persists the task context across every turn

The Anchoring System Prompt

System prompt: You are helping design a REST API for a todo application. The app has three entities: Users, Projects, and Tasks. Current requirements: - RESTful endpoints for CRUD on all entities - JWT authentication - Role-based access (admin, member) - Tasks belong to Projects, Projects belong to Users Design constraints: - JSON:API format - Pagination on all list endpoints - Rate limiting headers in responses When the user asks about a new feature, integrate it into the existing API design. Always maintain consistency with previous decisions.

Anchored Follow-Ups

Turn 2 (Anchored): "Add rate limiting to the API we designed. Specifically: 100 req/min for authenticated users, 20 req/min for unauthenticated. Show me how this changes the response headers." Turn 3 (Anchored): "Now add Redis caching for the GET /projects and GET /tasks list endpoints. TTL of 60 seconds, invalidate on POST/PUT/DELETE. How does this integrate with the rate limiting from the previous turn?"

What Changed

1. System prompt anchors the task: The model is reminded of the project on every turn.

2. Explicit references: “the API we designed,” “from the previous turn” — these tie each turn to the ongoing work.

3. Specific requests: “Show me how this changes the response headers” instead of “what about rate limiting?”

Key insight: The system prompt is your anchor. It persists across every turn and reminds the model of the overall task, current state, and constraints. Update it as the project evolves. Think of it as the “project brief” that every team member reads before contributing.

summarize

Periodic Summaries: Compress Without Losing

“So far we’ve decided X, Y, Z” — the most powerful multi-turn technique

The Summary Checkpoint

Every 3–5 turns, insert a summary message that consolidates all decisions so far. This serves three purposes:

1. Reinforces context: Puts key decisions at the end of the history (recency bias helps)
2. Corrects drift: If the model went off track, the summary steers it back
3. Saves tokens: You can trim earlier turns and keep just the summary

Summary Message Pattern

User (Turn 6 — Checkpoint): "Before we continue, let me summarize what we've decided so far: 1. API Design: 8 RESTful endpoints covering Users, Projects, Tasks CRUD 2. Auth: JWT with refresh tokens, role-based (admin/member) 3. Rate Limiting: 100/min auth, 20/min unauth, headers in response 4. Caching: Redis, 60s TTL on list endpoints, invalidate on writes 5. Format: JSON:API with pagination Does this match your understanding? If anything is wrong, correct it. Then let's move on to error handling."

Why “Does This Match?” Matters

Asking the model to confirm the summary does two things:

1. Catches errors: If your summary misrepresents something, the model will correct it.

2. Locks in decisions: The model’s confirmation becomes part of the history, reinforcing those decisions for all future turns.

Automated Summary (For APIs)

# Every N turns, ask the model to # summarize, then use that summary # to replace older messages if len(messages) > 10: summary = get_summary(messages[1:-2]) messages = [ messages[0], # system prompt {"role": "user", "content": f"Summary so far: {summary}"}, messages[-2], # last user msg messages[-1], # last assistant msg ]

Key insight: Summaries are the multi-turn equivalent of saving your work. They compress the conversation into a checkpoint that the model can build on. Without summaries, long conversations degrade. With them, you can maintain quality across 20+ turns.

storage

Domain Example: Iterative Database Schema Design

Building a schema through conversation — steering, correcting, and building on previous turns

Well-Structured Conversation

System: "You are a database architect helping design a PostgreSQL schema for an e-commerce platform." Turn 1: "Design the core tables: users, products, orders, order_items. Include PKs, FKs, and common indexes." Turn 2: "Good. Now add: (1) product categories with a tree structure, (2) product variants (size, color). Build on the schema from Turn 1." Turn 3: "I see you used an adjacency list for categories. I'd prefer a materialized path approach for faster tree queries. Please revise just the categories table, keeping everything else the same." Turn 4 — Checkpoint: "Here's our schema so far: - users (id, email, name, created_at) - products (id, name, category_path, ...) - product_variants (id, product_id, ...) - orders (id, user_id, status, ...) - order_items (id, order_id, variant_id) - categories (id, name, path) Correct? Now add inventory tracking."

Techniques Used

Turn 1: Clear initial scope with specific deliverables (PKs, FKs, indexes)

Turn 2: “Build on the schema from Turn 1” — explicit reference prevents the model from starting over

Turn 3: Course correction — “I’d prefer X instead of Y. Revise just Z, keeping everything else.” Specific, surgical change.

Turn 4: Summary checkpoint — lists the current state, asks for confirmation, then introduces the next task.

Casual Chat Style

“Add categories.” “Now variants.” “Actually change the categories.” “Now add inventory.”

Model loses track by turn 4. Final schema is inconsistent.

Collaborative Document Style

Explicit references, surgical corrections, periodic summaries. Model maintains consistency across 10+ turns. Final schema is coherent.

Key insight: Iterative design conversations work best when you treat each turn as an edit to a shared document. Reference previous work explicitly, make corrections surgically (“change just X, keep Y”), and checkpoint regularly.

undo

Course Correction: When the Model Goes Wrong

How to redirect without starting over — surgical corrections that preserve good work

Bad Correction

❌ "That's wrong. Do it again." The model doesn't know WHAT was wrong. It might change things that were correct and keep things that were wrong.

Good Correction

✓ "Two issues with your response: 1. The orders table is missing a `shipping_address_id` FK. Add it. 2. You used VARCHAR(255) for email but our convention is VARCHAR(320) per RFC 5321. Update all email fields. Everything else looks good. Keep the rest unchanged."

Correction Patterns

The Surgical Fix: “Change X to Y. Keep everything else the same.”

The Additive Fix: “Good, but also add Z to what you have.”

The Reframe: “I like the structure, but I need it to use [different approach]. Rewrite using [approach] while keeping the same entities and relationships.”

The Rollback: “Let’s go back to the version from Turn 3. From there, instead of [what you did in Turn 4], do [alternative].”

Key insight: Vague corrections (“that’s wrong, try again”) cause the model to change things randomly. Specific corrections (“change X, keep Y”) preserve good work and fix only what’s broken. Always tell the model what to keep, not just what to change.

memory

Memory Management: Handling Long Conversations

What to do when the conversation exceeds the context window

The Context Window Problem

Even with 128K-token context windows (GPT-4o, Claude 3.5), long conversations eventually hit limits. And even before the hard limit, quality degrades as the context grows:

• 5–10 turns: Full quality, no management needed
• 10–20 turns: Start using summaries, trim old turns
• 20–50 turns: Aggressive summarization required
• 50+ turns: Consider starting a new conversation with a summary seed

Strategy 1: Sliding Window + Summary

# Keep: system prompt + summary of old # turns + last N turns messages = [ system_prompt, {"role": "system", "content": "Conversation summary: [auto-generated summary of turns 1 through 15]"}, # Last 5 turns (10 messages) ...recent_messages ]

Strategy 2: Fresh Start with Seed

# When conversation is too long, # start fresh with context New conversation, Turn 1: "I've been working on a database schema for an e-commerce platform. Here's where we are: [paste the latest complete schema] Decisions made: - Materialized path for categories - JSONB for product attributes - Soft deletes on all tables Now I need to add: payment processing tables (transactions, refunds, payment methods)."

Strategy 3: External State

For production applications, store conversation state externally:

• Key decisions in a database
• Current artifacts (schema, code, docs) in files
• Conversation summary updated after each turn

Inject the relevant state into each new API call. The model doesn’t need the full conversation — it needs the current state and the latest request.

Key insight: The best multi-turn strategy is to minimize the need for multi-turn. Extract artifacts (code, schemas, documents) from the conversation into external storage. Feed them back as context when needed. The conversation is the process; the artifacts are the product.

checklist

The Multi-Turn Playbook

Rules for conversations that stay on track from turn 1 to turn 20

The 7 Rules

1. Anchor with a system prompt Define the task, constraints, and current state upfront 2. Reference previous turns explicitly "Building on the schema from Turn 2" not "add more stuff" 3. Checkpoint every 3-5 turns "Here's what we have so far: ..." 4. Correct surgically "Change X, keep everything else" not "that's wrong, try again" 5. Extract artifacts Save code/schemas/docs outside the conversation as you go 6. Manage context window Summarize old turns, trim history, or start fresh with a seed 7. One topic per turn Don't ask 3 questions in one message. Each turn = one clear request.

Turn Quality Checklist

Before sending each message, verify:

Does it reference the current state? (“Building on...”, “Given our decision about...”)

Is the request specific? (“Add X to Y” not “what about X?”)

Does it specify what to keep? (“Keep the auth endpoints unchanged”)

Is it one topic? (Split multi-part requests into separate turns)

Key insight: Multi-turn conversations are a skill, not a feature. The model doesn’t manage the conversation — you do. Anchor, reference, checkpoint, correct, extract. Treat every turn as an edit to a shared document, and your conversations will produce consistent, high-quality results across any number of turns.

arrow_back Ch 10: RAG & Context Injection Ch 12: Tool Use & Function Calling arrow_forward