bolt

AI Infrastructure

GPUs, accelerators, training clusters, inference serving, networking, cooling — the hardware and systems that power AI

Co-Created by Kiran Shirol and Claude

Topics GPUs & Accelerators Networking Training Clusters Inference Serving Power & Cooling Cloud & Orchestration

home Learning Portal play_arrow Start Learning summarize Key Insights dictionary Glossary 14 chapters · 5 sections

Section 1

Foundation — Why Special Hardware?

CPUs vs GPUs, architecture deep dive, and the accelerator landscape.

memory

Why CPUs Aren’t Enough

Sequential vs parallel, matrix multiplication, and the 1,000× throughput gap.

arrow_forward Learn

developer_board

Inside a GPU — Architecture That Powers AI

SMs, CUDA cores, Tensor Cores, HBM, and precision formats (FP32 to FP8).

arrow_forward Learn

devices

The Accelerator Zoo

H100, B200, MI300X, TPU Trillium, Trainium — specs and trade-offs.

arrow_forward Learn

Section 2

Core Techniques — Connecting the Dots

Memory bottlenecks, interconnects, and network topologies for AI clusters.

sd_card

Memory — The Real Bottleneck

HBM vs GDDR, KV cache explosion, and a practical GPU sizing guide.

arrow_forward Learn

cable

Interconnects — How GPUs Talk

NVLink, NVSwitch, PCIe, InfiniBand vs RoCEv2, and the bandwidth hierarchy.

arrow_forward Learn

lan

Network Topologies for AI Clusters

Fat-tree, leaf-spine, rail-optimized, DGX SuperPOD, and TPU pods.

arrow_forward Learn

Section 3

Training at Scale

Distributed training parallelism and the anatomy of an AI factory.

Distributed Training — Splitting the Work

Data, tensor, pipeline parallelism, FSDP, DeepSpeed, and 3D parallelism.

arrow_forward Learn

dns

Training Clusters — Anatomy of an AI Factory

DGX SuperPOD, GB200 NVL, Meta’s 16K-GPU Llama 3 training, and real costs.

arrow_forward Learn

Section 4

Serving and Storing

Inference infrastructure, storage pipelines, and the energy question.

speed

Inference Infrastructure — Serving Models

Batching, KV cache, vLLM vs TensorRT-LLM, GPU sharing, and latency trade-offs.

arrow_forward Learn

storage

Storage and Data Pipelines

Lustre, GPFS, GPUDirect Storage, checkpoint architecture, and pipeline sizing.

arrow_forward Learn

bolt

Power, Cooling, and the Energy Question

1,000W/GPU reality, PUE, liquid vs immersion cooling, and the $3.4M/year math.

arrow_forward Learn

Section 5

Strategy — Making It Work

Cloud vs on-prem, orchestration, and the future of AI infrastructure.

cloud

Cloud vs On-Prem — Where to Run AI

AWS vs Azure vs GCP, specialized clouds, on-prem economics, and break-even.

arrow_forward Learn

grid_view

Orchestration and Scheduling

Kubernetes GPU scheduling, Slurm, Ray, multi-tenancy, and setup patterns.

arrow_forward Learn

rocket_launch

The Future of AI Infrastructure

Co-Packaged Optics, chiplets, wafer-scale, photonic computing, and sovereign AI.

arrow_forward Learn

AI Infrastructure

Foundation — Why Special Hardware?

Core Techniques — Connecting the Dots

Training at Scale

Serving and Storing

Strategy — Making It Work

Explore Related Courses