The Key Fields
// From a typical LLM config.json
"hidden_size": 4096, // Width of each layer
"num_hidden_layers": 32, // Depth (number of layers)
"num_attention_heads": 32, // Query heads
"num_key_value_heads": 8, // KV heads (GQA)
"vocab_size": 128256, // Tokenizer vocabulary
"max_position_embeddings": 131072, // Max context
"rope_theta": 500000.0, // Rotary embedding base
"rms_norm_eps": 1e-05 // Normalization
What Each Field Tells You
hidden_size × num_hidden_layers = the model’s capacity. More layers and wider layers = more parameters.
num_attention_heads vs num_key_value_heads = attention type (equal = MHA, different = GQA).
max_position_embeddings = the maximum context length. 131072 = 128K tokens.
rope_theta = the rotary position encoding base. Higher values (500K+) indicate models trained for long context. Lower values (10K) suggest shorter effective context.
Key insight: You don’t need to understand the math behind every field. The config.json is the model’s blueprint — scan it for the key numbers (layers, heads, context length, vocab size) and move on.