Rack-Scale Computing
The GB200 NVL72 represents a paradigm shift from server-scale to rack-scale computing:
72 Blackwell GPUs + 36 Grace CPUs in a single liquid-cooled rack. All 72 GPUs are connected by 5th-gen NVLink through 9 NVSwitch trays, creating one massive NVLink domain.
Key difference from DGX H100: In DGX H100, the NVLink domain is 8 GPUs (one server). Cross-server communication uses InfiniBand. In GB200 NVL72, the NVLink domain is 72 GPUs (one rack). This means tensor parallelism can span the entire rack at NVLink speeds.
Unified memory: 72 × 192 GB = 13.8 TB of GPU memory accessible at NVLink bandwidth. A 405B model in FP16 (810 GB) fits entirely within one rack’s NVLink domain — no InfiniBand needed for model weights.
Performance: 30x faster real-time inference than H100 for LLMs. 4x faster training per GPU.
GB200 NVL72 Specs
GB200 NVL72 Rack:
GPUs: 72× B200
CPUs: 36× Grace (ARM)
GPU Memory: 13.8 TB HBM3e
NVLink BW: 130 TB/s total
FP4 Total: ~720 PFLOPS
Cooling: Liquid (required)
Power: ~120 kW per rack
Weight: ~1,400 kg
DGX H100 vs GB200 NVL72:
DGX H100 NVL72
GPUs/unit: 8 72
NVLink domain:8 72
GPU memory: 640 GB 13.8 TB
NVLink BW: 7.2 TB/s 130 TB/s
Cooling: Air Liquid
Power: 10.2 kW ~120 kW
The NVLink domain going from
8 to 72 GPUs is the biggest
architectural change. It
eliminates the InfiniBand
bottleneck for most models.
Key insight: The GB200 NVL72 changes the economics of large model training. By putting 72 GPUs in one NVLink domain, it eliminates the InfiniBand bottleneck for models up to ~7 trillion parameters. This means simpler parallelism strategies, fewer networking costs, and higher GPU utilization. The trade-off: liquid cooling is mandatory, and the rack costs $2–3 million.