Ch 4 — Embeddings — Under the Hood
Transformer internals, pooling, contrastive loss, normalization, and batching
Under the Hood
-
Click play or press Space to begin...
ATokenization & Input FormattingHow text enters the model
1text_fields
Raw TextChunk string
from splitter
tokenize
data_array
Token IDsBPE subwords
[CLS] + tokens + [SEP]
encode
2grid_on
Token EmbedsLookup table
one vector per token
3arrow_downward Transformer encoder layers
BTransformer EncoderSelf-attention across all tokens
neurology
Self-AttentionEach token attends
to every other token
×12 layers
layers
Hidden StatesContextualized
token representations
pool
4compress
PoolingCLS token or
mean pooling
5arrow_downward Contrastive training objective
CContrastive LearningHow the model learns to map meaning to geometry
link
Positive PairsSimilar texts
push closer
loss
functions
InfoNCEContrastive loss
with negatives
hard neg
6link_off
Hard NegativesSimilar but wrong
push further apart
7arrow_downward Normalization and Matryoshka
DNormalization & MatryoshkaPost-processing the raw vector
straighten
L2 NormalizeUnit length
||v|| = 1
truncate?
content_cut
MatryoshkaKeep first N dims
re-normalize
output
8scatter_plot
Final VectorReady for
vector store
9arrow_downward Batching and API usage
EBatching & Production UsageEfficient embedding at scale
view_list
Batch InputSend multiple texts
in one API call
parallel
speed
GPU BatchingPad to max length
process in parallel
cache
cached
CachingHash text → vector
avoid re-embedding
10arrow_downward Query vs document asymmetry
FQuery vs Document AsymmetryWhy some models need prefixes
search
Query"query: What is
the refund policy?"
vs
description
Document"passage: Customers
may request a refund..."
match
check_circle
RetrievalAsymmetric search
short → long