Ch 4: Embeddings — Under the Hood

Ch 4 — Embeddings — Under the Hood

Transformer internals, pooling, contrastive loss, normalization, and batching

Index ← High Level

Under the Hood

Click play or press Space to begin...

Step- / 10

ATokenization & Input FormattingHow text enters the model

text_fields

Raw TextChunk string
from splitter

tokenize

data_array

Token IDsBPE subwords
[CLS] + tokens + [SEP]

encode

grid_on

Token EmbedsLookup table
one vector per token

arrow_downward Transformer encoder layers

BTransformer EncoderSelf-attention across all tokens

neurology

Self-AttentionEach token attends
to every other token

×12 layers

layers

Hidden StatesContextualized
token representations

pool

compress

PoolingCLS token or
mean pooling

arrow_downward Contrastive training objective

CContrastive LearningHow the model learns to map meaning to geometry

link

Positive PairsSimilar texts
push closer

loss

functions

InfoNCEContrastive loss
with negatives

hard neg

link_off

Hard NegativesSimilar but wrong
push further apart

arrow_downward Normalization and Matryoshka

DNormalization & MatryoshkaPost-processing the raw vector

straighten

L2 NormalizeUnit length
||v|| = 1

truncate?

content_cut

MatryoshkaKeep first N dims
re-normalize

output

scatter_plot

Final VectorReady for
vector store

arrow_downward Batching and API usage

EBatching & Production UsageEfficient embedding at scale

view_list

Batch InputSend multiple texts
in one API call

parallel

speed

GPU BatchingPad to max length
process in parallel

cache

cached

CachingHash text → vector
avoid re-embedding

arrow_downward Query vs document asymmetry

FQuery vs Document AsymmetryWhy some models need prefixes

Query"query: What is
the refund policy?"

description

Document"passage: Customers
may request a refund..."

match

check_circle

RetrievalAsymmetric search
short → long