Expansion Ratio
The FFN projects up to a larger dimension and back down. For Llama 3.1 8B, hidden_size is 4,096 but intermediate_size is 14,336 — a ratio of about 3.5×. In standard FFNs (without SwiGLU), the ratio is typically 4× (giving 16,384). SwiGLU uses 2/3 of that (≈10,922) but rounds to a GPU-friendly number (14,336) to match 8B total params. The expansion gives the model a wider "workspace" to transform representations.
Dimension Flow
// Data flow through the MLP:
Input: [seq_len, 4096]
// ↓ gate_proj: × [14336, 4096].T
Gate path: [seq_len, 14336] // expand
// ↓ up_proj: × [14336, 4096].T
Up path: [seq_len, 14336] // expand
// ↓ element-wise multiply
Combined: [seq_len, 14336]
// ↓ down_proj: × [4096, 14336].T
Output: [seq_len, 4096] // compress
Why it matters: The 14,336-dimensional intermediate space is where the model does its "thinking" for each token — pattern matching, factual recall, reasoning. A wider intermediate dimension means more capacity for learned knowledge.