model.layers.15.self_attn.k_proj.weight tells you: model → layer 15 → self attention → key projection. You can reconstruct the entire architecture just from reading tensor names.hidden_size from 4096 to 8192 and every weight tensor doubles in width. You can calculate total parameters without opening any weight file.GGUF magic = GGUF. PK or pickle header = PyTorch. Works even on misnamed files.rope_theta=10,000 → ~4K context. rope_theta=500,000 → 128K context. Higher theta = slower frequency decay = model can distinguish positions across longer sequences. Computed at runtime, not stored as weights.