bolt
AI Infrastructure
GPUs, accelerators, training clusters, inference serving, networking, cooling — the hardware and systems that power AI
Co-Created by Kiran Shirol and Claude
Topics
GPUs & Accelerators
Networking
Training Clusters
Inference Serving
Power & Cooling
Cloud & Orchestration
home
Learning Portal
play_arrow
Start Learning
summarize
Key Insights
dictionary
Glossary
14 chapters
· 5 sections
Section 1
Foundation — Why Special Hardware?
CPUs vs GPUs, architecture deep dive, and the accelerator landscape.
1
memory
Why CPUs Aren’t Enough
Sequential vs parallel, matrix multiplication, and the 1,000× throughput gap.
arrow_forward
Learn
2
developer_board
Inside a GPU — Architecture That Powers AI
SMs, CUDA cores, Tensor Cores, HBM, and precision formats (FP32 to FP8).
arrow_forward
Learn
3
devices
The Accelerator Zoo
H100, B200, MI300X, TPU Trillium, Trainium — specs and trade-offs.
arrow_forward
Learn
Section 2
Core Techniques — Connecting the Dots
Memory bottlenecks, interconnects, and network topologies for AI clusters.
4
sd_card
Memory — The Real Bottleneck
HBM vs GDDR, KV cache explosion, and a practical GPU sizing guide.
arrow_forward
Learn
5
cable
Interconnects — How GPUs Talk
NVLink, NVSwitch, PCIe, InfiniBand vs RoCEv2, and the bandwidth hierarchy.
arrow_forward
Learn
6
lan
Network Topologies for AI Clusters
Fat-tree, leaf-spine, rail-optimized, DGX SuperPOD, and TPU pods.
arrow_forward
Learn
Section 3
Training at Scale
Distributed training parallelism and the anatomy of an AI factory.
7
share
Distributed Training — Splitting the Work
Data, tensor, pipeline parallelism, FSDP, DeepSpeed, and 3D parallelism.
arrow_forward
Learn
8
dns
Training Clusters — Anatomy of an AI Factory
DGX SuperPOD, GB200 NVL, Meta’s 16K-GPU Llama 3 training, and real costs.
arrow_forward
Learn
Section 4
Serving and Storing
Inference infrastructure, storage pipelines, and the energy question.
9
speed
Inference Infrastructure — Serving Models
Batching, KV cache, vLLM vs TensorRT-LLM, GPU sharing, and latency trade-offs.
arrow_forward
Learn
10
storage
Storage and Data Pipelines
Lustre, GPFS, GPUDirect Storage, checkpoint architecture, and pipeline sizing.
arrow_forward
Learn
11
bolt
Power, Cooling, and the Energy Question
1,000W/GPU reality, PUE, liquid vs immersion cooling, and the $3.4M/year math.
arrow_forward
Learn
Section 5
Strategy — Making It Work
Cloud vs on-prem, orchestration, and the future of AI infrastructure.
12
cloud
Cloud vs On-Prem — Where to Run AI
AWS vs Azure vs GCP, specialized clouds, on-prem economics, and break-even.
arrow_forward
Learn
13
grid_view
Orchestration and Scheduling
Kubernetes GPU scheduling, Slurm, Ray, multi-tenancy, and setup patterns.
arrow_forward
Learn
14
rocket_launch
The Future of AI Infrastructure
Co-Packaged Optics, chiplets, wafer-scale, photonic computing, and sovereign AI.
arrow_forward
Learn
explore
Explore Related Courses
memory
Small Models & Local AI
Quantization, Ollama & Edge
neurology
How LLMs Work
Transformers & Attention
tune
Fine-Tuning
Adapting Models to Your Data