Ch 14 — The AI Landscape Today
MoE internals, test-time compute, agent architectures, RAG pipelines, and MCP protocol
Under the Hood
A Modern Model Architectures
hub
Mixture of Experts (MoE)
Sparse routing, expert selection, DeepSeek/Llama 4/Mistral
S1
arrow_downward
view_in_ar
Multimodal Architecture
Vision encoders, cross-attention, early vs late fusion
S2
arrow_downward
B Test-Time Compute & Reasoning
psychology
Chain-of-Thought & Reasoning Tokens
How o1/o3/R1 think before answering
S3
arrow_downward
speed
Compute-Optimal Inference
Scaling laws for inference, adaptive compute
S4
arrow_downward
C Agent Architectures
smart_toy
ReAct & Agent Loops
Reason-Act-Observe pattern, tool calling, memory
S5
arrow_downward
cable
MCP Protocol Internals
JSON-RPC, tools/resources/prompts, transport
S6
arrow_downward
D RAG & Context Engineering
manage_search
RAG Pipeline Architecture
Embedding, chunking, retrieval, reranking, generation
S7
arrow_downward
database
Vector Databases & Embeddings
ANN search, HNSW, cosine similarity, hybrid search
S8
arrow_downward
E Inference & Deployment
memory
Inference Optimizations
KV cache, continuous batching, speculative decoding, quantization
S9
arrow_downward
rocket_launch
The Full Picture
Connecting all 14 chapters — the complete AI stack
S10