Ch 1 — What Is Context Engineering?

The paradigm shift from prompt engineering to context engineering
High Level
history
Origin
arrow_forward
edit_note
Prompts
arrow_forward
swap_horiz
Shift
arrow_forward
layers
Context
arrow_forward
category
Pillars
arrow_forward
trending_up
Impact
-
Click play or press Space to begin...
Step- / 8
history
The Prompt Engineering Era
2022–mid 2025: crafting the perfect instruction
What It Was
Prompt engineering was the practice of manually crafting specific text instructions for individual LLM interactions. Techniques like chain-of-thought, few-shot examples, and role-based system prompts dominated the field from 2022 through early 2025. The focus was entirely on how you phrase the question to the model.
The Limitation
In production systems, the user prompt is a tiny fraction of what the model actually sees. 80–90% of the context window is filled by retrieved documents, conversation history, tool definitions, and system instructions. Optimizing only the prompt is like tuning the radio while ignoring the engine.
Key insight: Prompt engineering addresses only one of eight components that enter an LLM’s context window. The other seven — system prompt, history, RAG docs, tool schemas, few-shot examples, memory, and metadata — are where production quality is won or lost.
campaign
The Naming Moment
Mid-2025: Karpathy and Lütke reframe the discipline
Andrej Karpathy
In mid-2025, former Tesla and OpenAI researcher Andrej Karpathy publicly described context engineering as “the delicate art and science of filling the context window with just the right information for the next step.” He argued that the real skill isn’t writing prompts — it’s curating the entire information environment the model receives.
Tobi Lütke
Shopify CEO Tobi Lütke independently endorsed the same shift, calling context engineering a “core skill” for anyone building AI products. The convergence of these two voices — one from deep research, one from enterprise product leadership — signaled that the industry was moving beyond prompt craft.
Key insight: When both the research community and the business community independently arrive at the same conclusion, it usually signals a genuine paradigm shift rather than a passing trend.
menu_book
Foundational Publications
Manus and Anthropic lay the groundwork
Manus (July 2025)
Manus published lessons from rebuilding their agent framework four times. Key findings: don’t dynamically add or remove tools mid-iteration (it invalidates the KV-cache), keep recent tool calls in raw format to preserve the model’s “rhythm,” and never compress away error traces.
Anthropic (September 2025)
Anthropic followed with their guide on effective context engineering for agents. Their core principle: find “the smallest possible set of high-signal tokens that maximise the likelihood of desired outcomes.” The guide covered system instructions, tool definitions, MCP resources, retrieved documents, and conversation history.
Why it matters: These two publications became the de facto reference material for the field. The patterns they described — progressive disclosure, compression, routing — were adopted across platforms within months.
definition
The Formal Definition
What context engineering actually means
Definition
Context engineering is the practice of deciding what information an AI model sees, when it sees it, and how it is structured — at runtime. It covers everything that enters the context window: system instructions, user prompts, conversation history, retrieved documents, tool definitions, few-shot examples, memory stores, and metadata.
The connection: Prompt engineering tells the model how to talk. Context engineering controls what it sees when it talks. The distinction matters because performance gains in 2026 come from dynamic context selection, compression, and memory management — not from clever prompt wording.
Prompt vs Context
Prompt Engineering
Manual, one-off instruction craft. Focuses on the user prompt. Static per interaction. Doesn’t scale to multi-turn agent systems.
Context Engineering
Automated, systematic infrastructure. Manages all 8 context components. Dynamic at runtime. Designed for production agent pipelines.
psychology
Why LLMs Need Context Engineering
Finite attention budgets and the lost-in-the-middle problem
Finite Attention
LLMs have a finite attention budget. Every token in the context window competes for attention. As context grows, precision drops, reasoning weakens, and the model starts missing information it should catch. Research calls this the “lost in the middle” problem — models show U-shaped performance curves, favoring content at the beginning and end while struggling with information in the middle.
Real-World Degradation
Despite advertised context windows of 128K to 2M+ tokens, real-world performance degrades 30–40% before hitting the technical limit. Systematic context management can prevent 30% of this information loss. The paradox: more context often means worse answers, because irrelevant tokens dilute the model’s attention on what actually matters.
Critical in AI: A refund policy question that dumps 50 pages of documents from 2018 to 2026 into the context will confuse the model with contradictory policies. Adding more documents makes the response worse, not better. This is a context problem, not a prompt problem.
category
The Five Core Pillars
The building blocks of context engineering
Pillar 1 — Retrieval
Optimizing document selection through chunking strategy, embedding choice, and reranking. Evolved from fixed RAG pipelines to agent-controlled retrieval loops (Agentic RAG) that can reformulate queries and iterate until confident.
Pillar 2 — Memory
Persisting information across conversations through short-term buffers (working memory for immediate reasoning) and long-term stores (vector-based archival memory). Enables agents to accumulate knowledge across sessions.
Pillar 3 — State Management
Tracking agent workflow progress with explicit state machines. Prevents agents from losing track of multi-step tasks and enables recovery from failures without restarting entire workflows.
Pillar 4 — Context Compression
Fitting useful information into fixed windows through summarization and selective inclusion. Sliding window hybrids keep recent turns raw while compressing older context. Companies report 60–80% cost reduction.
Pillar 5 — Information Routing
Directing appropriate context to different model calls in multi-agent systems. A billing question doesn’t need the onboarding knowledge base. Routing classifies the query and selects the right context source before anything enters the window.
Key insight: These five pillars are not alternatives — they layer together. Progressive disclosure defines what can enter the window, routing and compression manage what stays during execution, and retrieval brings in external knowledge on demand.
science
Research Validation
Academic evidence for context engineering’s impact
Stanford / SambaNova / UC Berkeley
Joint research demonstrated that context editing delivered a 10.6% performance improvement on agentic tasks with 86.9% lower latency compared to fine-tuning. This means you can often get better results by engineering the context than by retraining the model itself.
Why it matters: Fine-tuning is expensive, slow, and requires specialized infrastructure. Context engineering is fast, iterative, and can be deployed immediately. The research shows it’s also more effective for many agentic use cases.
Enterprise Impact
Companies implementing effective context management report 35–60% accuracy improvements in enterprise AI systems. A fintech startup reduced document analysis costs from $30,600 to $4,100 monthly (87% reduction) through token budgeting — extracting relevant sections via RAG, compressing context, and caching system prompts.
Cost Economics
At GPT-4o pricing of $2.50 per million input tokens, a 128K context window costs $0.32 per request. Running 10,000 requests daily means $96,000/month. Context engineering’s cost reduction techniques (caching, compression, selective inclusion) are not optional at scale — they’re survival.
trending_up
The State of the Art in 2026
From niche concern to core discipline
Industry Adoption
Context engineering has gone from a niche concern to the core discipline of AI engineering in under a year. The patterns described by Manus and Anthropic in mid-2025 have been adopted across every major platform. Agent Skills (markdown files with YAML frontmatter) were released by Anthropic in December 2025 and adopted by OpenAI, Google, GitHub, and Cursor within weeks.
Key insight: Context engineering is not a replacement for prompt engineering — it’s the next evolution. Both remain relevant, but the shift is from manual, one-off prompt crafting to systematic, scalable infrastructure for consistent AI performance.
What’s Next
The field is converging on layered architectures where progressive disclosure and tool management define what can enter the context window, routing and compression manage what stays during execution, retrieval brings in external knowledge on demand, and evaluation measures whether any of it is working. MCP (Model Context Protocol), now governed by the Agentic AI Foundation under the Linux Foundation, has become the standard for connecting agents to external tools.
The Relationship to Harness Engineering
Context engineering is one pillar of the broader discipline of harness engineering — the design of complete systems (constraints, feedback loops, documentation, linting) that make AI agents reliable. Context engineering controls what the model sees; harness engineering controls the entire environment the agent operates in.