token
How LLMs Work
From text to tokens to transformers — the complete story of large language models
Co-Created by Kiran Shirol and Claude
Topics
Tokenization
Transformers
Training
Alignment
Inference
home
Learning Portal
play_arrow
Start Learning
summarize
Key Insights
dictionary
Glossary
14 chapters
· 5 parts
Part 1
From Text to Numbers
Tokenization, embeddings, and the attention mechanism.
1
text_fields
Text to Tokens
BPE, WordPiece, SentencePiece — how raw text becomes numbers.
arrow_forward
Learn
2
scatter_plot
Embeddings: Meaning as Math
Word2Vec, contextual embeddings, and why king − man + woman = queen.
arrow_forward
Learn
3
visibility
Attention: The Core Innovation
Q/K/V, multi-head attention, masks — how every token decides who to listen to.
arrow_forward
Learn
Part 2
The Transformer Architecture
Blocks, scaling, and the training recipe for frontier models.
4
layers
The Transformer Block
LayerNorm, FFN, residual connections — the repeating unit of every LLM.
arrow_forward
Learn
5
expand
Scaling Up: From Transformer to LLM
Parameter counts, MoE, context windows, and scaling laws.
arrow_forward
Learn
6
model_training
The Training Recipe
Next-token prediction, data mixtures, infrastructure, and training costs.
arrow_forward
Learn
Part 3
Making LLMs Useful
Fine-tuning, alignment, and text generation mechanics.
7
tune
Fine-Tuning & Instruction Following
SFT, instruction tuning, LoRA, QLoRA — from text predictor to assistant.
arrow_forward
Learn
8
thumb_up
RLHF & Alignment
Reward models, PPO, DPO — the step that turns a predictor into ChatGPT.
arrow_forward
Learn
9
edit_note
How LLMs Generate Text
Temperature, top-k, top-p, beam search, and one-token-at-a-time decoding.
arrow_forward
Learn
Part 4
Under the Hood in Practice
Context windows, inference optimization, and multimodal capabilities.
10
memory
Context Windows & Memory
RoPE, ALiBi, KV cache, RAG as external memory, and infinite context.
arrow_forward
Learn
11
speed
Making LLMs Fast
Quantization, speculative decoding, Flash Attention, and continuous batching.
arrow_forward
Learn
12
image
Multimodal LLMs
Vision encoders, CLIP, image tokens, audio, video, and tool use.
arrow_forward
Learn
Part 5
The Bigger Picture
Emergent abilities, limitations, and the LLM landscape.
13
psychology
Emergent Abilities & Limitations
In-context learning, CoT reasoning, hallucinations, and what LLMs can’t do.
arrow_forward
Learn
14
explore
The LLM Landscape (Capstone)
GPT, Claude, Llama, Gemini, Mistral — open vs closed and where it’s heading.
arrow_forward
Learn
explore
Explore Related Courses
auto_graph
Mathematics for AI
Linear Algebra, Calculus & Stats
psychology
AI Fundamentals
Core Concepts & Building Blocks
tune
Fine-Tuning
Adapting Models to Your Data