Ch 13 — LangChain, LlamaIndex, and DSPy

Framework strategy for orchestration, retrieval, and optimization in open-model applications
Applications
hub
Compose
arrow_forward
search
Retrieve
arrow_forward
bolt
Act
arrow_forward
tune
Optimize
arrow_forward
apps
Deliver
-
Click play or press Space to begin the journey...
Step- / 7
view_quilt
Framework Roles at a Glance
These frameworks overlap, but each has a practical center of gravity.
LangChain
Strong for orchestration, tool calling, and composable agent workflows. Keep interfaces explicit so failures can be traced quickly.
LlamaIndex and DSPy
LlamaIndex excels in retrieval pipelines; DSPy excels in systematic prompt/program optimization. Measure impact with shared eval and observability metrics.
Overlap Reality
All three frameworks can solve overlapping problems, but forcing one tool to do everything often increases complexity without improving reliability. Introduce complexity only when a clear requirement justifies it.
Key Point: Use complementary strengths instead of forcing one framework everywhere.
account_tree
LangChain in Practice
LangChain shines in multi-step workflows with external tools.
Common Pattern
Define chains or agents, connect tools, and instrument traces for debugging and iteration. Review framework boundaries regularly as the product evolves.
Operational Need
Guardrails and observability are essential as workflow complexity grows. Keep interfaces explicit so failures can be traced quickly.
Failure Mode
Without explicit state and error handling, agent workflows become hard to debug and expensive to operate under real traffic variability. Measure impact with shared eval and observability metrics.
Key Point: LangChain is powerful when paired with disciplined tracing.
search
LlamaIndex in Practice
LlamaIndex is retrieval-first and document-workflow oriented.
Common Pattern
Build ingestion, indexing, and retrieval layers with configurable strategies and vector backends. Introduce complexity only when a clear requirement justifies it.
Operational Need
Evaluation of retrieval quality is critical before tuning generation behavior. Review framework boundaries regularly as the product evolves.
Retrieval Failure Mode
Weak chunking or indexing decisions can dominate downstream quality, even when model choice and prompting are strong. Keep interfaces explicit so failures can be traced quickly.
Key Point: Strong retrieval quality reduces downstream hallucination and cost.
psychology
DSPy in Practice
DSPy treats prompts and reasoning programs as optimizable components.
Common Pattern
Define task modules, compile against examples, and optimize for measurable objective functions. Measure impact with shared eval and observability metrics.
Operational Need
You need representative training/eval examples to realize DSPy benefits. Introduce complexity only when a clear requirement justifies it.
Eval Design
Define measurable objectives up front, then optimize against stable datasets. Optimization without clear targets tends to overfit style rather than task quality.
Key Point: DSPy is strongest when you can quantify quality goals.
join_full
How to Combine Them
Hybrid stacks are common in production applications.
Integration Pattern
Use LlamaIndex for retrieval, LangChain for orchestration, and DSPy for optimizing critical reasoning steps. Review framework boundaries regularly as the product evolves.
Governance
Keep interfaces explicit so framework boundaries remain maintainable. Keep interfaces explicit so failures can be traced quickly.
Integration Boundary
Assign clear ownership for retrieval, orchestration, and optimization layers to prevent unclear failure ownership in production incidents. Measure impact with shared eval and observability metrics.
Key Point: Composition works best with clear ownership boundaries.
monitoring
Evaluation and Observability
Framework abstraction never replaces measurement.
Eval Stack
Track retrieval precision, tool-call accuracy, response quality, and latency/cost metrics per route. Introduce complexity only when a clear requirement justifies it.
Feedback Loop
Use eval data to refine prompts, retrieval settings, and model routing rules continuously. Review framework boundaries regularly as the product evolves.
Observability Contract
Capture traces and quality signals in a shared schema across frameworks so cross-layer debugging remains consistent and fast. Keep interfaces explicit so failures can be traced quickly.
Key Point: Without metrics, framework choice is mostly guesswork.
task_alt
Decision Heuristic
Pick a starting point based on immediate product need.
If You Need
Complex tool orchestration: start with LangChain. Retrieval-heavy knowledge apps: start with LlamaIndex. Programmatic optimization: bring in DSPy.
Then
Add the other frameworks only when requirements clearly justify extra complexity. Measure impact with shared eval and observability metrics.
Adoption Sequence
Start with one framework that solves the immediate bottleneck, stabilize evaluation, then compose additional frameworks incrementally. Introduce complexity only when a clear requirement justifies it.
Key Point: Start narrow, integrate gradually, and keep architecture legible.