SuperPOD Design
The NVIDIA DGX SuperPOD is the reference architecture for building large AI training clusters. It’s what companies like Meta, Microsoft, and Oracle use:
Building block — DGX H100: 8 H100 GPUs connected by NVSwitch (900 GB/s per GPU). Each GPU has a 400G InfiniBand NIC.
Scalable Unit (SU): 32 DGX H100 systems (256 GPUs) connected by a rail-optimized InfiniBand fabric. 8 leaf switches (one per rail) + spine switches.
SuperPOD: Multiple Scalable Units connected by a spine/super-spine fabric. Scales to 16 SUs = 4,096 GPUs per SuperPOD.
GB200 NVL72 variant: Each rack has 72 GPUs connected by NVLink (130 TB/s). Racks connect via 400G/800G InfiniBand or Ethernet. The NVLink domain is much larger (72 vs 8 GPUs), reducing the need for cross-rack communication.
SuperPOD Numbers
DGX H100 SuperPOD:
Per DGX node (8 GPUs):
NVLink BW: 900 GB/s per GPU
InfiniBand: 8× 400G NICs
Power: ~10.2 kW
Per Scalable Unit (256 GPUs):
DGX nodes: 32
Leaf switches: 8 (rail-optimized)
Spine switches: 8
Power: ~330 kW
Full SuperPOD (4,096 GPUs):
Scalable Units: 16
Total switches: ~256+
Total cables: thousands
Power: ~5.3 MW
Cost: ~$150-200M
The networking infrastructure
(switches, cables, optics) can
cost 15-25% of the total cluster
cost. Not a rounding error.
Key insight: A DGX SuperPOD is not just GPUs — it’s a carefully engineered system where compute, networking, storage, power, and cooling are all designed together. The networking alone (switches, cables, optics, NICs) can cost $20–50 million for a 4,096-GPU cluster. This is why “just buy more GPUs” is never the full answer.