What You Now Know
Ch 1: An LLM file = metadata + tokenizer + weights (99.98%)
Ch 2: Three formats: Safetensors (safe, fast), GGUF (self-contained), PyTorch (legacy, risky)
Ch 3: Embedding = [vocab, hidden] lookup table, token IDs → vectors
Ch 4: Attention = Q, K, V, O projections with GQA shrinking K/V
Ch 5: FFN = SwiGLU gate/up/down, 65% of all parameters
Ch 6: Special tensors: RMSNorm, RoPE (computed), lm_head, MoE
Ch 7: Tokenizer = BPE vocab + merges + chat template contract
Ch 8: config.json is the DNA; KV cache grows with sequence length
The Complete File Map
// Every file in an LLM download:
config.json // DNA
generation_config.json // Defaults
tokenizer.json // Dictionary
tokenizer_config.json // Chat template
model.safetensors.index // Shard map
model-0000N.safetensors // Weights
// Not in the file but in your GPU:
KV cache // 128 KB/token
Activations // Temporary
Key insight: You can now open any LLM file, read its config, inspect its tensors, estimate its memory footprint, verify its tokenizer compatibility, and understand exactly what every byte is doing. You've completed the anatomy course.