Ch 4 — Embeddings — Under the Hood

Transformer internals, pooling, contrastive loss, normalization, and batching
Under the Hood
-
Click play or press Space to begin...
Step- / 10
ATokenization & Input FormattingHow text enters the model
1
text_fields
Raw TextChunk string
from splitter
tokenize
data_array
Token IDsBPE subwords
[CLS] + tokens + [SEP]
encode
2
grid_on
Token EmbedsLookup table
one vector per token
3
arrow_downward Transformer encoder layers
BTransformer EncoderSelf-attention across all tokens
neurology
Self-AttentionEach token attends
to every other token
×12 layers
layers
Hidden StatesContextualized
token representations
pool
4
compress
PoolingCLS token or
mean pooling
5
arrow_downward Contrastive training objective
CContrastive LearningHow the model learns to map meaning to geometry
link
Positive PairsSimilar texts
push closer
loss
functions
InfoNCEContrastive loss
with negatives
hard neg
6
link_off
Hard NegativesSimilar but wrong
push further apart
7
arrow_downward Normalization and Matryoshka
DNormalization & MatryoshkaPost-processing the raw vector
straighten
L2 NormalizeUnit length
||v|| = 1
truncate?
content_cut
MatryoshkaKeep first N dims
re-normalize
output
8
scatter_plot
Final VectorReady for
vector store
9
arrow_downward Batching and API usage
EBatching & Production UsageEfficient embedding at scale
view_list
Batch InputSend multiple texts
in one API call
parallel
speed
GPU BatchingPad to max length
process in parallel
cache
cached
CachingHash text → vector
avoid re-embedding
10
arrow_downward Query vs document asymmetry
FQuery vs Document AsymmetryWhy some models need prefixes
search
Query"query: What is
the refund policy?"
vs
description
Document"passage: Customers
may request a refund..."
match
check_circle
RetrievalAsymmetric search
short → long
1
Detail