Ch 1 — Vectors & Spaces

GPS coordinates for data — how AI measures similarity and meaning
Linear Algebra
explore
GPS Analogy
arrow_forward
straighten
Operations
arrow_forward
social_distance
Distance
arrow_forward
compare_arrows
Similarity
arrow_forward
grid_view
Spaces
arrow_forward
dictionary
Embeddings
arrow_forward
music_note
Spotify
-
Click play or press Space to begin...
Step- / 8
explore
A Vector Is a GPS Coordinate
Your first intuition — numbers that describe a position
The Analogy
Imagine you’re standing in a city. Your GPS coordinate is two numbers: latitude and longitude. Those two numbers tell you exactly where you are. A vector is the same idea — it’s just a list of numbers that describes a position in some space. In 2D, it’s a point on a map. In 3D, it’s a point in a room. In AI, it might be 768 numbers describing the “meaning” of a word.
Key insight: Every piece of data AI works with — words, images, songs — gets converted into a vector (a list of numbers) before the AI can “understand” it. No vector, no AI.
The Math
A vector v in n-dimensional space is an ordered list of n real numbers:
# A 2D vector (like GPS) v = [40.7128, -74.0060] # New York City # A 3D vector (like a point in a room) v = [2.0, 3.5, 1.0] # x, y, z # A 768D vector (like a word embedding) v = [0.12, -0.34, 0.56, ... , 0.78] # 768 numbers
Real World
GPS: 2 numbers locate you on Earth
In AI
Embedding: 768 numbers locate a word in “meaning space”
straighten
Vector Addition & Scaling
Combining directions and stretching them
The Analogy
Imagine walking 3 blocks east then 4 blocks north. Your total displacement is a new vector: [3, 4]. That’s vector addition — combining two movements into one. Now imagine you walk twice as far in the same direction: that’s scalar multiplication. You scale the vector by 2, getting [6, 8].
Key insight: When AI “averages” word embeddings to understand a sentence, it’s literally adding vectors and scaling by 1/n. The math of walking directions is the same math behind sentence understanding.
Worked Example
import numpy as np # Vector addition: walk east + walk north a = np.array([3, 0]) # 3 blocks east b = np.array([0, 4]) # 4 blocks north c = a + b # [3, 4] total displacement # Scalar multiplication: walk twice as far d = 2 * c # [6, 8] # Average of word embeddings for a sentence words = [embed("the"), embed("cat"), embed("sat")] sentence_vec = sum(words) / len(words)
Real World
Walk 3 east + 4 north = [3, 4] total
In AI
Average word vectors = sentence meaning
social_distance
The Dot Product — Measuring Alignment
How much do two vectors “agree”?
The Analogy
Imagine two people pointing flashlights. If they point in the same direction, the dot product is large and positive. If they point in opposite directions, it’s large and negative. If they point at right angles, it’s zero — they have nothing in common. The dot product measures how much two vectors agree.
Key insight: When a search engine ranks results, it computes the dot product between your query vector and every document vector. Higher dot product = more relevant result. Google does this billions of times per day.
Worked Example
# Dot product: multiply matching elements, sum a = [1, 2, 3] b = [4, 5, 6] # a · b = 1×4 + 2×5 + 3×6 = 4 + 10 + 18 = 32 dot = np.dot(a, b) # 32 # Same direction → large positive np.dot([1,0], [1,0]) # 1 (aligned) # Opposite → negative np.dot([1,0], [-1,0]) # -1 (opposed) # Perpendicular → zero np.dot([1,0], [0,1]) # 0 (unrelated)
Formula: a · b = a₁b₁ + a₂b₂ + ... + aₙbₙ = Σ aᵢbᵢ
straighten
Norms — Measuring Length
How “big” is a vector?
The Analogy
The norm of a vector is its length — like measuring the distance from your house to the office on a map. The L2 norm (Euclidean) is the straight-line “as the crow flies” distance. The L1 norm (Manhattan) is the “walking along city blocks” distance — you can only go horizontal or vertical.
Key insight: When AI regularizes a model (L1 or L2 regularization), it’s literally penalizing the “length” of the weight vector. L1 pushes weights to exactly zero (feature selection). L2 keeps weights small but non-zero. Same norms, different effects.
Worked Example
v = np.array([3, 4]) # L2 norm (Euclidean): √(3² + 4²) = √25 = 5 l2 = np.linalg.norm(v) # 5.0 # L1 norm (Manhattan): |3| + |4| = 7 l1 = np.linalg.norm(v, ord=1) # 7.0 # Unit vector (normalize to length 1) unit = v / np.linalg.norm(v) # [0.6, 0.8]
Real World
L2 = crow flies (5 km). L1 = city blocks (7 km).
In AI
L2 regularization shrinks weights. L1 regularization zeros them out.
compare_arrows
Cosine Similarity — Direction Over Distance
Are two vectors pointing the same way?
The Analogy
Two people can be different heights but have the same taste in music. Cosine similarity ignores “how big” the vectors are and only cares about direction. It’s like comparing the angle between two arrows: angle 0° = identical direction (similarity = 1), angle 90° = unrelated (similarity = 0), angle 180° = opposite (similarity = −1).
Key insight: This is exactly how Spotify finds songs similar to your taste. Your listening history is a vector, each song is a vector, and Spotify computes cosine similarity to rank which songs “point in the same direction” as your preferences.
Worked Example
# Cosine similarity = dot(a,b) / (‖a‖ × ‖b‖) a = np.array([1, 2, 3]) b = np.array([2, 4, 6]) # same direction, 2× longer cos_sim = np.dot(a, b) / ( np.linalg.norm(a) * np.linalg.norm(b) ) # 1.0 — perfectly aligned! # Perpendicular vectors c = np.array([1, 0]) d = np.array([0, 1]) # cos_sim = 0 / (1 × 1) = 0.0 — unrelated
Formula: cos(θ) = (a · b) / (‖a‖ × ‖b‖), range [−1, 1]
grid_view
Vector Spaces, Basis & Span
The coordinate system behind everything
The Analogy
A vector space is like a blank canvas with a coordinate grid. The basis vectors are the rulers — they define the grid lines. In 2D, the standard basis is “one step east” [1,0] and “one step north” [0,1]. The span is everywhere you can reach by combining those rulers. If you have 2 independent rulers in 2D, you can reach any point on the canvas.
Key insight: When PCA finds “principal components,” it’s literally choosing new basis vectors — new rulers that align with the directions where your data varies the most. Same space, better coordinate system.
Worked Example
# Standard basis in 2D e1 = np.array([1, 0]) # east e2 = np.array([0, 1]) # north # Any 2D point = combination of basis point = 3 * e1 + 4 * e2 # [3, 4] # Linear independence: can't make e2 from e1 # Dependent: [2,4] = 2 × [1,2] (same line) # Independent: [1,0] and [0,1] (different dirs) # Dimension = number of basis vectors needed # 2D space → 2 basis vectors # 768D embedding → 768 basis vectors
Real World
Map grid: east/north rulers let you reach any point
In AI
PCA picks new rulers aligned with data’s natural directions
dictionary
Word Embeddings — Meaning as Vectors
King − Man + Woman = Queen
The Analogy
Imagine a city where every word has an address. Words with similar meanings live in the same neighborhood: “happy,” “joyful,” and “glad” are on the same street. “Sad” is across town. Word embeddings (Word2Vec, GloVe) assign each word a vector so that the distance between vectors reflects the distance between meanings.
Key insight: The famous equation king − man + woman ≈ queen works because the vector from “man” to “woman” captures a gender direction. Adding that direction to “king” lands you near “queen.” Relationships become vector arithmetic!
Worked Example
from gensim.models import KeyedVectors # Load pre-trained Word2Vec (Google News) wv = KeyedVectors.load_word2vec_format( 'GoogleNews-vectors-negative300.bin' ) # king - man + woman ≈ queen result = wv.most_similar( positive=['king', 'woman'], negative=['man'] ) # → [('queen', 0.7118), ...] # Cosine similarity between words wv.similarity('happy', 'joyful') # ~0.73 wv.similarity('happy', 'sad') # ~0.41
Source: Mikolov et al. (2013) “Efficient Estimation of Word Representations in Vector Space” introduced Word2Vec and demonstrated vector arithmetic on word embeddings.
music_note
Spotify & Real-World Vector Search
How vectors power recommendations at scale
The Analogy
Spotify represents every song and every listener as a high-dimensional vector. Your listening history becomes a vector that captures your taste. Each song has a vector combining audio features (tempo, energy, danceability) and collaborative signals (what similar listeners enjoy). Finding your next favorite song = finding the nearest vector to yours.
Why it matters for AI: Spotify’s Voyager library (2023) performs approximate nearest-neighbor search across millions of song vectors in milliseconds. The same technique powers Google search, Netflix recommendations, and ChatGPT’s retrieval-augmented generation (RAG). Vectors are the universal language of AI similarity.
In Practice
import torch # Simulated: user taste vector & song vectors user = torch.randn(128) # 128-dim taste songs = torch.randn(10000, 128) # 10k songs # Cosine similarity with all songs at once sims = torch.nn.functional.cosine_similarity( user.unsqueeze(0), songs ) # Top 5 recommendations top5 = sims.topk(5).indices # → tensor([4821, 7103, 2294, 9012, 331])
Real World
Find the 5 closest coffee shops to your GPS location
In AI
Find the 5 closest song vectors to your taste vector