inventory_2

Anatomy of an LLM File — Deep Dive

Open the hood on model weight files — tensor structures, embedding matrices, attention weights, tokenizer internals, and how every byte maps to the transformer architecture.

Co-Created by Kiran Shirol and Claude

Core Topics Tensors & Weights File Formats Attention & FFN Tokenizers Config & Metadata Memory & Runtime

home Learning Portal play_arrow Start Learning summarize Key Insights dictionary Glossary 8 chapters · 3 sections

Section 1

The Container — File Formats and Metadata

What’s inside an LLM file, three container formats, and what the first bytes tell you.

package_2

What’s Actually in an LLM File?

The big picture: metadata, tokenizer, and weight tensors — why 99.98% of the file is weights, and what knowing this helps you do.

arrow_forward Learn

folder_zip

File Formats — Safetensors, GGUF & PyTorch

Three container formats and their internal layouts: zero-copy mmap, self-contained KV metadata, and legacy pickle. Sharding and magic bytes.

arrow_forward Learn

Section 2

Inside the Weights — What Each Tensor Does

Embeddings, attention projections, feed-forward networks, and the special tensors that hold it all together.

grid_on

The Embedding Layer — Words to Vectors

The embed_tokens.weight tensor: shape [vocab_size, hidden_size], token ID to row lookup, vocabulary impact on file size, and weight tying.

arrow_forward Learn

center_focus_strong

Attention Weights — The Focus Mechanism

Q, K, V, O projection matrices, their tensor names and shapes, multi-head packing, and how GQA shrinks K/V tensors by 75%.

arrow_forward Learn

neurology

The Feed-Forward Network — Where Knowledge Lives

Three SwiGLU matrices (gate, up, down), the 3.5× expansion ratio, and why FFN accounts for ~65% of total parameters.

arrow_forward Learn

tune

Special Tensors — Norm, RoPE & Output Head

RMSNorm weights, Rotary Position Embeddings, the lm_head projection, and Mixture-of-Experts tensors.

arrow_forward Learn

Section 3

The Supporting Cast — Tokenizer, Config & Runtime

Tokenizer files, configuration blueprints, and the runtime structures that don’t live in the file.

spellcheck

The Tokenizer Files — Text to Token IDs

Inside tokenizer.json: vocabulary, BPE merge rules, special tokens, and the Jinja2 chat template — wrong template means gibberish.

arrow_forward Learn

settings

config.json, generation_config & Runtime

The architectural blueprint where every field maps to a tensor shape, generation defaults, and KV cache that can consume more memory than the model.

arrow_forward Learn

Anatomy of an LLM File — Deep Dive

The Container — File Formats and Metadata

Inside the Weights — What Each Tensor Does

The Supporting Cast — Tokenizer, Config & Runtime

Related Courses