The Cascade Effect
The CPU-GPU performance gap doesn’t just affect training speed. It cascades through every infrastructure decision:
GPU scarcity → GPU pricing: H100s cost $25,000–$40,000 each. Demand far exceeds supply. This drives the entire cloud GPU market.
GPU power → cooling crisis: An H100 draws 700W. A B200 draws 1,000W. A rack of 8 GPUs needs 8,000W just for compute. This is why data centers are moving to liquid cooling.
GPU memory limits → distributed training: A 70B model doesn’t fit on one GPU. You need 4–8 GPUs connected by fast interconnects. This drives NVLink, InfiniBand, and cluster networking.
GPU efficiency → cost optimization: At $3–4/hr per GPU, keeping GPUs idle is burning money. This drives scheduling, orchestration, and utilization optimization.
Course Roadmap
What we'll cover next:
Ch 2: GPU architecture deep dive
(CUDA cores, Tensor Cores, SMs)
Ch 3: The accelerator landscape
(NVIDIA, AMD, Google TPU, AWS)
Ch 4: Memory — the real bottleneck
(HBM, bandwidth, KV cache)
Ch 5: Interconnects — how GPUs talk
(NVLink, InfiniBand, PCIe)
Ch 6: Network topologies
(fat-tree, rail-optimized)
Ch 7: Distributed training
(data/tensor/pipeline parallelism)
Ch 8-14: Training clusters, inference,
storage, power, cloud, and more
Key insight: Every chapter in this course exists because of the fundamental truth we covered here: AI needs parallel compute that CPUs can’t provide. GPUs fill that gap, but they bring their own constraints — power, cooling, memory, networking, cost — and the entire field of AI infrastructure exists to manage those constraints.