Ch 1 — How Prompts Actually Work

What happens inside the LLM when you hit send — and why vague prompts get vague answers
Foundations
edit_note
Prompt
arrow_forward
token
Tokens
arrow_forward
casino
Probability
arrow_forward
tune
Temperature
arrow_forward
compare
Vague vs Clear
arrow_forward
psychology
Mental Model
arrow_forward
lightbulb
Takeaway
-
Click play or press Space to begin...
Step- / 7
edit_note
You Type Words. The Model Sees Numbers.
The first thing to understand about prompts
The Reality
When you type “Explain Kubernetes”, the model doesn’t see two English words. It sees a sequence of token IDs — numbers like [849, 11452, 77621]. The model has never “read” anything. It has learned statistical patterns: “after these token IDs, what token IDs typically come next?” Your prompt is the starting pattern. Everything the model generates is a continuation of that pattern.
Key insight: The model doesn’t understand your intent. It predicts what text typically follows your text. This single fact explains 90% of prompt engineering: if you want specific output, you need to set up a specific pattern for the model to continue.
What the Model Actually Sees
# You type: "Explain Kubernetes" # Model sees (GPT-4 tokenizer): [849, 11452, 77621] # "Ex" + "plain" + " Kubernetes" # The model's job: # "Given tokens [849, 11452, 77621], # what token is most likely next?" # It doesn't "know" what Kubernetes is. # It knows what text patterns typically # follow "Explain Kubernetes" in its # training data.
casino
Every Response Is a Probability Roll
The model picks from thousands of possible next tokens
How Generation Works
After processing your prompt, the model produces a probability distribution over its entire vocabulary (~100K tokens). A vague prompt like “Tell me about dogs” creates a flat distribution — many possible continuations are roughly equally likely. The model might talk about breeds, history, biology, or training. A specific prompt creates a peaked distribution — fewer continuations are likely, and they’re all relevant.
Key insight: A vague prompt is like asking “tell me something” to a room of 100 people. You’ll get 100 different answers. A specific prompt is like asking “what’s the capital of France?” — everyone says “Paris.” Your prompt’s specificity directly controls the randomness of the output.
Probability in Action
# Vague prompt: "Tell me about dogs" # Next token probabilities (spread out): "Dogs"8% "There"6% "The"5% "A"4% "Man's"3% # → Many directions, unpredictable # Specific prompt: "List the 3 largest # dog breeds by weight, with average # weight in kg:" # Next token probabilities (peaked): "1"45% "\n"20% "Here"12% # → Narrow range, predictable output
tune
Temperature: The Randomness Dial
Why the same prompt gives different answers each time
What Temperature Does
Temperature controls how much randomness the model uses when picking from the probability distribution. At temperature 0, it always picks the most likely token (deterministic, repetitive). At temperature 1, it samples proportionally (creative, varied). At temperature 2, even unlikely tokens get a chance (chaotic, often nonsensical). Most APIs default to 0.7-1.0.
Key insight: Temperature and prompt specificity work together. A vague prompt + high temperature = chaos. A specific prompt + low temperature = predictable, focused output. For factual tasks (data extraction, code), use temperature 0-0.3. For creative tasks (brainstorming, writing), use 0.7-1.0.
Temperature Effects
# Same prompt at different temperatures: # "Name a color" # Temperature 0 (always most likely): "Blue""Blue""Blue""Blue" # Temperature 0.7 (balanced): "Blue""Red""Green""Blue" # Temperature 1.5 (very random): "Cerulean""Mauve""Burnt sienna" # Practical settings: # Code generation: T = 0 # Data extraction: T = 0 # General chat: T = 0.7 # Creative writing: T = 0.9-1.0 # Brainstorming: T = 1.0-1.2
compare
The Vague vs Clear Test
See the difference with a real example
Do This Instead
Let’s say you need help understanding a technology for work. Here’s what most people type vs what actually works.
Superficial Prompt
Prompt: “Explain Kubernetes”

Output: “Kubernetes is an open-source container orchestration platform developed by Google. It automates the deployment, scaling, and management of containerized applications...”

Generic Wikipedia-style answer. Not useful for your specific situation.
Deliberate Prompt
Prompt: “Explain Kubernetes to a backend developer who knows Docker but has never used orchestration. Focus on why they’d need it for a microservices app with 10 services, not what it is.”

Output: “You know how to run containers with Docker. Now imagine you have 10 services, each in its own container. When one crashes at 3am, who restarts it? When traffic spikes, who spins up more copies? That’s what Kubernetes does...”

Targeted, relevant, immediately useful.
psychology
Why the Deliberate Prompt Works
Understanding the mechanics behind the improvement
Breaking It Down
The deliberate prompt works because every phrase constrains the probability space:

“backend developer who knows Docker” — eliminates beginner-level explanations
“never used orchestration” — sets the right starting point
“why they’d need it” — forces motivation-first, not definition-first
“microservices app with 10 services” — gives a concrete scenario
“not what it is” — explicitly blocks the Wikipedia pattern

Each constraint removes thousands of possible continuations, leaving only the useful ones.
The Constraint Effect
# "Explain Kubernetes" # Possible directions: ~50 # (history, architecture, comparison, # tutorial, definition, use cases...) # + "to a backend developer" # Possible directions: ~20 # (eliminates beginner content) # + "who knows Docker" # Possible directions: ~10 # (can use Docker as reference point) # + "focus on why, not what" # Possible directions: ~3 # (motivation, pain points, benefits) # + "10 microservices" # Possible directions: ~1-2 # (concrete scenario, specific advice) # More constraints = fewer directions # = more useful output
bug_report
Real Example: Debugging a Problem
The difference between “help me fix this” and actually getting help
What Most People Do
Prompt: “My Python code isn’t working. I get a KeyError.”

Output: “A KeyError in Python occurs when you try to access a dictionary key that doesn’t exist. You can fix this by using .get() method or checking if the key exists first...”

Generic advice. Doesn’t address YOUR specific bug.
What Actually Works
Prompt: “I have a Flask API endpoint that processes user registrations. When I POST to /api/register with {"email": "test@example.com"}, I get:
KeyError: 'name' at line 15.
Here’s the route handler: [paste code].
The request body should have email and name, but the frontend sometimes omits name. What’s the root cause and the safest fix?”

Output: Precise diagnosis + defensive code with request.json.get('name', '') + input validation suggestion.

Specific problem, specific fix.
lightbulb
The Mental Model: Prompts Are Probability Constraints
The one idea that changes how you prompt forever
Remember This
Every word in your prompt is a constraint on the output distribution. The model isn’t “thinking” about your question — it’s continuing a pattern. Your job as a prompt engineer is to set up a pattern that can only be continued in useful ways. More constraints = less randomness = more useful output. This is the foundation everything else in this course builds on.
Key insight: Stop thinking of prompts as “questions to an AI.” Start thinking of them as “the beginning of a document that the model will complete.” If your prompt reads like the start of a generic blog post, you’ll get a generic blog post. If it reads like the start of a specific technical analysis, you’ll get a specific technical analysis. You write the first paragraph; the model writes the rest.
Quick Reference
# The Prompt Engineering Mental Model: # 1. The model predicts next tokens # (it doesn't "understand" you) # 2. Your prompt sets the pattern # (beginning of a document) # 3. Specificity = constraint # (each detail narrows possibilities) # 4. Temperature = randomness # (low for facts, high for creativity) # 5. Vague in = vague out # (always, without exception) # From now on, before sending a prompt, # ask yourself: # "If I saw this as the start of a # document, what would I expect next?" # If the answer is "anything", your # prompt needs more constraints.