PCIe in AI Systems
PCI Express (PCIe) is the standard interface that connects GPUs to the CPU and to each other in consumer and entry-level server systems. Every GPU has a PCIe connection.
PCIe 4.0 x16: 32 GB/s per direction, 64 GB/s bidirectional. Used in older servers and consumer PCs.
PCIe 5.0 x16: 64 GB/s per direction, 128 GB/s bidirectional. Current standard in modern servers.
PCIe 6.0 x16: 128 GB/s per direction, 256 GB/s bidirectional. Emerging in 2025–2026.
PCIe is fine for single-GPU workloads (loading model weights from CPU memory to GPU) but becomes a severe bottleneck for multi-GPU communication. At 128 GB/s, transferring a 70B model’s gradients (280 GB in FP32) takes over 2 seconds — an eternity when training steps should take milliseconds.
PCIe Generation Comparison
PCIe Bandwidth (x16 slot):
Gen 3.0: 16 GB/s per direction
Gen 4.0: 32 GB/s per direction
Gen 5.0: 64 GB/s per direction
Gen 6.0: 128 GB/s per direction
Compare to NVLink 5.0:
NVLink: 900 GB/s per direction
PCIe 5: 64 GB/s per direction
Ratio: 14x faster
PCIe use cases in AI:
✓ CPU ↔ GPU data transfer
✓ Single-GPU inference
✓ Loading model weights
✓ Storage I/O (GPUDirect)
✗ Multi-GPU training sync
✗ Tensor parallelism
✗ High-performance inference
PCIe is the "last resort" for
GPU-to-GPU communication.
NVLink is always preferred
when available.
Key insight: PCIe is like a country road connecting two cities. It works, but it’s slow for heavy traffic. For multi-GPU AI workloads, you need a highway (NVLink) or a railroad (InfiniBand). PCIe remains important for CPU-GPU communication and storage I/O, but it’s never the right choice for GPU-to-GPU data exchange at scale.