inventory_2
Anatomy of an LLM File — Deep Dive
Open the hood on model weight files — tensor structures, embedding matrices, attention weights, tokenizer internals, and how every byte maps to the transformer architecture.
Co-Created by Kiran Shirol and Claude
Core Topics
Tensors & Weights
File Formats
Attention & FFN
Tokenizers
Config & Metadata
Memory & Runtime
home
Learning Portal
play_arrow
Start Learning
summarize
Key Insights
dictionary
Glossary
8 chapters
· 3 sections
Section 1
The Container — File Formats and Metadata
What’s inside an LLM file, three container formats, and what the first bytes tell you.
1
package_2
What’s Actually in an LLM File?
The big picture: metadata, tokenizer, and weight tensors — why 99.98% of the file is weights, and what knowing this helps you do.
arrow_forward
Learn
2
folder_zip
File Formats — Safetensors, GGUF & PyTorch
Three container formats and their internal layouts: zero-copy mmap, self-contained KV metadata, and legacy pickle. Sharding and magic bytes.
arrow_forward
Learn
Section 2
Inside the Weights — What Each Tensor Does
Embeddings, attention projections, feed-forward networks, and the special tensors that hold it all together.
3
grid_on
The Embedding Layer — Words to Vectors
The embed_tokens.weight tensor: shape [vocab_size, hidden_size], token ID to row lookup, vocabulary impact on file size, and weight tying.
arrow_forward
Learn
4
center_focus_strong
Attention Weights — The Focus Mechanism
Q, K, V, O projection matrices, their tensor names and shapes, multi-head packing, and how GQA shrinks K/V tensors by 75%.
arrow_forward
Learn
5
neurology
The Feed-Forward Network — Where Knowledge Lives
Three SwiGLU matrices (gate, up, down), the 3.5× expansion ratio, and why FFN accounts for ~65% of total parameters.
arrow_forward
Learn
6
tune
Special Tensors — Norm, RoPE & Output Head
RMSNorm weights, Rotary Position Embeddings, the lm_head projection, and Mixture-of-Experts tensors.
arrow_forward
Learn
Section 3
The Supporting Cast — Tokenizer, Config & Runtime
Tokenizer files, configuration blueprints, and the runtime structures that don’t live in the file.
7
spellcheck
The Tokenizer Files — Text to Token IDs
Inside tokenizer.json: vocabulary, BPE merge rules, special tokens, and the Jinja2 chat template — wrong template means gibberish.
arrow_forward
Learn
8
settings
config.json, generation_config & Runtime
The architectural blueprint where every field maps to a tensor shape, generation defaults, and KV cache that can consume more memory than the model.
arrow_forward
Learn
link
Related Courses
token
How LLMs Work
Transformer architecture conceptually
memory
Small Models & Local AI
Quantization & edge deployment
description
Reading Model Cards
Files tab & model documentation
bolt
AI Infrastructure
GPUs, serving & hardware
model_training
Fine-Tuning
How training modifies these weights