Ch 1: How Prompts Actually Work — Prompt Engineering Mastery

Ch 1 — How Prompts Actually Work

What happens inside the LLM when you hit send — and why vague prompts get vague answers

arrow_backIndex

Foundations

edit_note

Prompt

arrow_forward

token

Tokens

arrow_forward

casino

Probability

arrow_forward

tune

Temperature

arrow_forward

compare

Vague vs Clear

arrow_forward

psychology

Mental Model

arrow_forward

lightbulb

Takeaway

Click play or press Space to begin...

Step- / 7

edit_note

You Type Words. The Model Sees Numbers.

The first thing to understand about prompts

The Reality

When you type “Explain Kubernetes”, the model doesn’t see two English words. It sees a sequence of token IDs — numbers like [849, 11452, 77621]. The model has never “read” anything. It has learned statistical patterns: “after these token IDs, what token IDs typically come next?” Your prompt is the starting pattern. Everything the model generates is a continuation of that pattern.

Key insight: The model doesn’t understand your intent. It predicts what text typically follows your text. This single fact explains 90% of prompt engineering: if you want specific output, you need to set up a specific pattern for the model to continue.

What the Model Actually Sees

# You type: "Explain Kubernetes" # Model sees (GPT-4 tokenizer): [849, 11452, 77621] # "Ex" + "plain" + " Kubernetes" # The model's job: # "Given tokens [849, 11452, 77621], # what token is most likely next?" # It doesn't "know" what Kubernetes is. # It knows what text patterns typically # follow "Explain Kubernetes" in its # training data.

casino

Every Response Is a Probability Roll

The model picks from thousands of possible next tokens

How Generation Works

After processing your prompt, the model produces a probability distribution over its entire vocabulary (~100K tokens). A vague prompt like “Tell me about dogs” creates a flat distribution — many possible continuations are roughly equally likely. The model might talk about breeds, history, biology, or training. A specific prompt creates a peaked distribution — fewer continuations are likely, and they’re all relevant.

Key insight: A vague prompt is like asking “tell me something” to a room of 100 people. You’ll get 100 different answers. A specific prompt is like asking “what’s the capital of France?” — everyone says “Paris.” Your prompt’s specificity directly controls the randomness of the output.

Probability in Action

# Vague prompt: "Tell me about dogs" # Next token probabilities (spread out): "Dogs" → 8% "There" → 6% "The" → 5% "A" → 4% "Man's" → 3% # → Many directions, unpredictable # Specific prompt: "List the 3 largest # dog breeds by weight, with average # weight in kg:" # Next token probabilities (peaked): "1" → 45% "\n" → 20% "Here" → 12% # → Narrow range, predictable output

tune

Temperature: The Randomness Dial

Why the same prompt gives different answers each time

What Temperature Does

Temperature controls how much randomness the model uses when picking from the probability distribution. At temperature 0, it always picks the most likely token (deterministic, repetitive). At temperature 1, it samples proportionally (creative, varied). At temperature 2, even unlikely tokens get a chance (chaotic, often nonsensical). Most APIs default to 0.7-1.0.

Key insight: Temperature and prompt specificity work together. A vague prompt + high temperature = chaos. A specific prompt + low temperature = predictable, focused output. For factual tasks (data extraction, code), use temperature 0-0.3. For creative tasks (brainstorming, writing), use 0.7-1.0.

Temperature Effects

# Same prompt at different temperatures: # "Name a color" # Temperature 0 (always most likely): "Blue" → "Blue" → "Blue" → "Blue" # Temperature 0.7 (balanced): "Blue" → "Red" → "Green" → "Blue" # Temperature 1.5 (very random): "Cerulean" → "Mauve" → "Burnt sienna" # Practical settings: # Code generation: T = 0 # Data extraction: T = 0 # General chat: T = 0.7 # Creative writing: T = 0.9-1.0 # Brainstorming: T = 1.0-1.2

compare

The Vague vs Clear Test

See the difference with a real example

Do This Instead

Let’s say you need help understanding a technology for work. Here’s what most people type vs what actually works.

Superficial Prompt

Prompt: “Explain Kubernetes”

Output: “Kubernetes is an open-source container orchestration platform developed by Google. It automates the deployment, scaling, and management of containerized applications...”

Generic Wikipedia-style answer. Not useful for your specific situation.

Deliberate Prompt

Prompt: “Explain Kubernetes to a backend developer who knows Docker but has never used orchestration. Focus on why they’d need it for a microservices app with 10 services, not what it is.”

Output: “You know how to run containers with Docker. Now imagine you have 10 services, each in its own container. When one crashes at 3am, who restarts it? When traffic spikes, who spins up more copies? That’s what Kubernetes does...”

Targeted, relevant, immediately useful.

psychology

Why the Deliberate Prompt Works

Understanding the mechanics behind the improvement

Breaking It Down

The deliberate prompt works because every phrase constrains the probability space:

• “backend developer who knows Docker” — eliminates beginner-level explanations
• “never used orchestration” — sets the right starting point
• “why they’d need it” — forces motivation-first, not definition-first
• “microservices app with 10 services” — gives a concrete scenario
• “not what it is” — explicitly blocks the Wikipedia pattern

Each constraint removes thousands of possible continuations, leaving only the useful ones.

The Constraint Effect

# "Explain Kubernetes" # Possible directions: ~50 # (history, architecture, comparison, # tutorial, definition, use cases...) # + "to a backend developer" # Possible directions: ~20 # (eliminates beginner content) # + "who knows Docker" # Possible directions: ~10 # (can use Docker as reference point) # + "focus on why, not what" # Possible directions: ~3 # (motivation, pain points, benefits) # + "10 microservices" # Possible directions: ~1-2 # (concrete scenario, specific advice) # More constraints = fewer directions # = more useful output

bug_report

Real Example: Debugging a Problem

The difference between “help me fix this” and actually getting help

What Most People Do

Prompt: “My Python code isn’t working. I get a KeyError.”

Output: “A KeyError in Python occurs when you try to access a dictionary key that doesn’t exist. You can fix this by using .get() method or checking if the key exists first...”

Generic advice. Doesn’t address YOUR specific bug.

What Actually Works

Prompt: “I have a Flask API endpoint that processes user registrations. When I POST to /api/register with {"email": "test@example.com"}, I get:
KeyError: 'name' at line 15.
Here’s the route handler: [paste code].
The request body should have email and name, but the frontend sometimes omits name. What’s the root cause and the safest fix?”

Output: Precise diagnosis + defensive code with request.json.get('name', '') + input validation suggestion.

Specific problem, specific fix.

lightbulb

The Mental Model: Prompts Are Probability Constraints

The one idea that changes how you prompt forever

Remember This

Every word in your prompt is a constraint on the output distribution. The model isn’t “thinking” about your question — it’s continuing a pattern. Your job as a prompt engineer is to set up a pattern that can only be continued in useful ways. More constraints = less randomness = more useful output. This is the foundation everything else in this course builds on.

Key insight: Stop thinking of prompts as “questions to an AI.” Start thinking of them as “the beginning of a document that the model will complete.” If your prompt reads like the start of a generic blog post, you’ll get a generic blog post. If it reads like the start of a specific technical analysis, you’ll get a specific technical analysis. You write the first paragraph; the model writes the rest.

Quick Reference

# The Prompt Engineering Mental Model: # 1. The model predicts next tokens # (it doesn't "understand" you) # 2. Your prompt sets the pattern # (beginning of a document) # 3. Specificity = constraint # (each detail narrows possibilities) # 4. Temperature = randomness # (low for facts, high for creativity) # 5. Vague in = vague out # (always, without exception) # From now on, before sending a prompt, # ask yourself: # "If I saw this as the start of a # document, what would I expect next?" # If the answer is "anything", your # prompt needs more constraints.

Ch 2: Anatomy of a Great Prompt arrow_forward