Ch 14 — The AI Landscape Today

Reasoning models, agentic AI, open source, and where it’s all heading
High Level
landscape
Frontier
arrow_forward
psychology
Reasoning
arrow_forward
smart_toy
Agents
arrow_forward
code
Code AI
arrow_forward
open_in_new
Open
arrow_forward
rocket_launch
Future
-
Click play or press Space to begin the journey...
Step- / 8
landscape
The Frontier Model Landscape
GPT-4o, Claude, Gemini, and the race for capability
Frontier Models (2025–2026): OpenAI: GPT-4o — multimodal (text, image, audio, video) o1 / o3 — reasoning models (test-time compute) o3-mini — efficient reasoning for production o4-mini — latest small reasoning model Anthropic: Claude 3.5 Sonnet — best coding model Claude 3.7 Sonnet — extended thinking mode Claude 4 Opus — frontier reasoning Google: Gemini 2.0 Flash — fast daily tasks Gemini 2.5 Pro — thinking mode, 1M context Gemini 3 Pro — dynamic reasoning effort Meta: Llama 4 Scout / Maverick — open-weight MoE Llama 4 Behemoth — frontier open model
The Paradigm Shift
The era of “just make it bigger” is over. The new frontier: test-time compute scaling (reasoning models that think longer), multimodal fusion (text + image + audio + video natively), and agentic capabilities (models that take actions, not just generate text). Enterprise adoption hit 88% in 2025.
The quality gap collapsed. Open-weight models (Llama 4, DeepSeek, Mistral) now match proprietary models on most benchmarks. The choice is no longer “best model” but “best model for my constraints” — latency, cost, privacy, sovereignty, and reasoning depth all matter more than raw capability.
psychology
Reasoning Models
Test-time compute — thinking longer instead of training bigger
The Test-Time Compute Revolution
Traditional scaling: train a bigger model (more parameters, more data). Reasoning models scale at inference: given a hard problem, the model “thinks” for longer, generating a chain of reasoning tokens before answering. A smaller model thinking for 60 seconds can outperform a larger model answering instantly.
OpenAI o1 / o3: RL-trained to produce chain-of-thought o3 scored 45.1% on ARC-AGI 96.7% on AIME math competition Hidden reasoning tokens (not shown to user) DeepSeek-R1 (Jan 2025): Open-weight, MIT license Pure RL — no supervised fine-tuning Matched o1 performance Proved reasoning emerges from RL alone Trained at fraction of Western lab costs Claude 3.7 Sonnet “Extended Thinking”: Developer-controlled thinking budget Adjustable: quick response vs deep reasoning Hybrid approach for production use
Why This Matters
Inference is projected to account for two-thirds of AI compute by 2026. The shift from training-time to test-time scaling changes the economics: instead of spending $100M+ on training, you spend on inference when the user needs it. This democratizes access — anyone can run a reasoning model locally.
The DeepSeek shock: DeepSeek-R1 showed that frontier reasoning can emerge from pure RL on a modest budget. It triggered a re-evaluation of the assumption that only well-funded Western labs can build frontier models. Open-weight reasoning models are now competitive with proprietary ones.
smart_toy
Agentic AI
From chatbots to autonomous task-completing agents
What Are AI Agents?
AI agents go beyond chat: they plan, use tools, take actions, and iterate until a task is complete. An agent can browse the web, write and run code, query databases, send emails, and manage files — all autonomously. The LLM is the “brain”; tools are the “hands.”
Agent Loop: 1. Receive task from user 2. Plan: break into subtasks 3. Act: call tools (search, code, API) 4. Observe: read tool results 5. Reflect: is the task complete? 6. Repeat steps 3–5 until done Key Components: LLM: reasoning engine Tools: web search, code exec, APIs Memory: conversation + long-term store Planning: task decomposition Guardrails: safety constraints
MCP (Model Context Protocol): Open standard by Anthropic (2024) “USB-C for AI” — universal tool interface JSON-RPC 2.0 protocol 97M monthly SDK downloads (Feb 2026) Adopted by OpenAI, Google, Microsoft Agent Frameworks: LangGraph, CrewAI, AutoGen, Semantic Kernel Production use: finance, logistics, compliance Products: OpenAI Operator — autonomous browser agent Google Deep Research — multi-hour research Anthropic Claude Computer Use — desktop control Cursor Agent — autonomous coding
2025 was the year of agents. The shift from “AI that answers questions” to “AI that completes tasks” is the biggest practical change since ChatGPT. MCP standardized tool integration. Agent frameworks matured. The next frontier: multi-agent systems where specialized agents collaborate.
code
AI for Code
Copilots, agents, and the transformation of software engineering
Evolution of AI Coding: 2021: GitHub Copilot — autocomplete Single-line and function completion Codex (GPT-3 fine-tuned on code) 2023: Chat-based coding “Explain this code” / “Fix this bug” GPT-4, Claude for code generation 2024: Agentic coding Cursor Agent, Copilot Workspace Multi-file edits, codebase understanding Context-aware: reads files, runs tests 2025–26: Autonomous coding agents Full feature implementation from spec Background agents (Cursor, Devin, Codex) MCP for tool integration (git, DB, APIs) Agents that plan, code, test, debug, deploy
How AI Coding Agents Work
Context engineering is the key innovation. Modern coding agents dynamically discover relevant files, read documentation, search codebases semantically, and load only what’s needed. They use tools (terminal, file system, linters, tests) to verify their work. The LLM reasons; the tools ground it in reality.
The impact: AI doesn’t replace developers — it amplifies them. Junior tasks (boilerplate, tests, documentation) are increasingly automated. Senior skills (architecture, design, code review, understanding requirements) become more valuable. The role shifts from “writing code” to “directing AI to write code.”
open_in_new
Open Source & Small Models
Llama, Mistral, DeepSeek, and AI on every device
Open-Weight Ecosystem: Meta Llama: Llama 3.1 405B — first open frontier model Llama 4 Scout (17B active, 109B total MoE) Llama 4 Maverick (17B active, 400B total) 10M context window, multimodal Mistral AI: Mistral Large 3 (675B MoE, 41B active) Mistral Small 4 (119B MoE, reasoning) Ministral 3B/8B/14B — edge deployment Apache 2.0 license DeepSeek: DeepSeek-V3 (671B MoE, general) DeepSeek-R1 (reasoning, MIT license) Trained at ~$5.6M (vs $100M+ for GPT-4) Others: Qwen 2.5 (Alibaba), Gemma 2 (Google) Phi-4 (Microsoft), Command R+ (Cohere)
Small Models & Edge AI
Not everything needs a 400B model. Small language models (1B–14B parameters) run on phones, laptops, and robots. Quantization (INT4) shrinks models 4x. Apple Intelligence runs on-device. Mistral’s Ministral runs on NVIDIA Jetson. The future: AI everywhere, not just in the cloud.
Why open matters: Open-weight models enable sovereignty (run in your country), privacy (data never leaves your server), customization (fine-tune for your domain), and cost control (no per-token API fees). The gap between open and proprietary models shrinks every quarter. For many use cases, open models are already the better choice.
precision_manufacturing
Embodied AI & Robotics
Humanoid robots, self-driving, and AI in the physical world
Humanoid Robots: Tesla Optimus — mass production by late 2026 Gen-3: 125 lbs, 16-core AI chip, 24hr battery Goal: 1M units/year Figure 02 — OpenAI-powered manipulation Boston Dynamics Atlas — electric, commercial 1X NEO — household tasks Self-Driving: Waymo — 150K+ paid rides/week (2025) Tesla FSD — supervised, expanding End-to-end neural networks replacing hand-coded rules Foundation Models for Robotics: RT-2 (Google) — vision-language-action \u03c0\u2080 (Physical Intelligence) — general robot policy Trained on diverse robot data + internet data Transfer across robot morphologies
The LLM-Robot Connection
LLMs provide the reasoning layer for robots. The robot perceives the world (vision), the LLM plans actions in natural language, and a low-level controller executes. This decouples high-level reasoning from physical control — the same LLM that writes code can instruct a robot to “pick up the red cup and place it on the shelf.”
Sim-to-real at scale: Training robots in simulation (millions of episodes in hours) then transferring to real hardware is now standard. Domain randomization, digital twins, and foundation models for robotics are converging to make general-purpose robots feasible within this decade.
hub
The AI Stack
Infrastructure, tooling, and the modern AI development workflow
The Modern AI Stack: Hardware: NVIDIA H100/H200/B200 GPUs Google TPU v5p/v6 Custom chips: Groq (LPU), Cerebras (WSE) Apple Neural Engine, Qualcomm NPU Training: PyTorch (dominant), JAX (Google) DeepSpeed, FSDP, Megatron-LM Weights & Biases, MLflow (tracking) Inference: vLLM, TensorRT-LLM, llama.cpp Ollama (local), Together AI (cloud) PagedAttention, speculative decoding Orchestration: LangChain/LangGraph, LlamaIndex MCP (tool integration) Vector DBs: Pinecone, Weaviate, Chroma Evaluation: LM-Eval, HELM, Chatbot Arena (LMSYS) Human preference (ELO ratings)
RAG: Retrieval-Augmented Generation
LLMs have knowledge cutoffs and hallucinate. RAG fixes this: embed your documents into a vector database, retrieve relevant chunks at query time, and include them in the prompt. The model generates answers grounded in your data. RAG is the most common production pattern for enterprise AI.
Context engineering > prompt engineering. The new skill is not writing better prompts — it’s building systems that provide the right context at the right time. Dynamic context discovery, semantic search, tool results, and memory management are what make AI agents effective.
rocket_launch
Where It’s All Heading
The next 3–5 years and the course in perspective
Near-Term (2026–2027): Reasoning models become default Agents handle multi-step workflows AI coding agents write 50%+ of code Humanoid robots enter factories On-device AI becomes standard Medium-Term (2027–2029): Multi-agent systems collaborate AI scientists accelerate research Personalized AI tutors for everyone Autonomous vehicles widespread AI regulation matures globally Open Questions: AGI timeline? Estimates range 2027–2040+ Economic impact? Massive disruption likely Alignment? Unsolved for superhuman systems Energy? AI compute demand vs climate goals Governance? Global coordination needed
This Course in Perspective
You’ve traveled from the McCulloch-Pitts neuron (1943) to reasoning agents (2025). From perceptrons to transformers, from Q-tables to RLHF, from pixel classifiers to multimodal world models. The core ideas — gradient descent, attention, self-supervised learning, reinforcement learning — remain the foundation of everything being built today.
The most important skill: AI is moving faster than any technology in history. What matters is not memorizing today’s models but understanding the principles behind them. Architectures change; fundamentals don’t. You now have the foundation to understand whatever comes next.