Ch 1: Vectors & Spaces — Mathematics Behind AI & ML

explore

A Vector Is a GPS Coordinate

Your first intuition — numbers that describe a position

The Analogy

Imagine you’re standing in a city. Your GPS coordinate is two numbers: latitude and longitude. Those two numbers tell you exactly where you are. A vector is the same idea — it’s just a list of numbers that describes a position in some space. In 2D, it’s a point on a map. In 3D, it’s a point in a room. In AI, it might be 768 numbers describing the “meaning” of a word.

Key insight: Every piece of data AI works with — words, images, songs — gets converted into a vector (a list of numbers) before the AI can “understand” it. No vector, no AI.

The Math

A vector v in n-dimensional space is an ordered list of n real numbers:

# A 2D vector (like GPS) v = [40.7128, -74.0060] # New York City # A 3D vector (like a point in a room) v = [2.0, 3.5, 1.0] # x, y, z # A 768D vector (like a word embedding) v = [0.12, -0.34, 0.56, ... , 0.78] # 768 numbers

Real World

GPS: 2 numbers locate you on Earth

In AI

Embedding: 768 numbers locate a word in “meaning space”

straighten

Vector Addition & Scaling

Combining directions and stretching them

The Analogy

Imagine walking 3 blocks east then 4 blocks north. Your total displacement is a new vector: [3, 4]. That’s vector addition — combining two movements into one. Now imagine you walk twice as far in the same direction: that’s scalar multiplication. You scale the vector by 2, getting [6, 8].

Key insight: When AI “averages” word embeddings to understand a sentence, it’s literally adding vectors and scaling by 1/n. The math of walking directions is the same math behind sentence understanding.

Worked Example

import numpy as np # Vector addition: walk east + walk north a = np.array([3, 0]) # 3 blocks east b = np.array([0, 4]) # 4 blocks north c = a + b # [3, 4] total displacement # Scalar multiplication: walk twice as far d = 2 * c # [6, 8] # Average of word embeddings for a sentence words = [embed("the"), embed("cat"), embed("sat")] sentence_vec = sum(words) / len(words)

Real World

Walk 3 east + 4 north = [3, 4] total

In AI

Average word vectors = sentence meaning

social_distance

The Dot Product — Measuring Alignment

How much do two vectors “agree”?

The Analogy

Imagine two people pointing flashlights. If they point in the same direction, the dot product is large and positive. If they point in opposite directions, it’s large and negative. If they point at right angles, it’s zero — they have nothing in common. The dot product measures how much two vectors agree.

Key insight: When a search engine ranks results, it computes the dot product between your query vector and every document vector. Higher dot product = more relevant result. Google does this billions of times per day.

Worked Example

# Dot product: multiply matching elements, sum a = [1, 2, 3] b = [4, 5, 6] # a · b = 1×4 + 2×5 + 3×6 = 4 + 10 + 18 = 32 dot = np.dot(a, b) # 32 # Same direction → large positive np.dot([1,0], [1,0]) # 1 (aligned) # Opposite → negative np.dot([1,0], [-1,0]) # -1 (opposed) # Perpendicular → zero np.dot([1,0], [0,1]) # 0 (unrelated)

Formula: a · b = a₁b₁ + a₂b₂ + ... + aₙbₙ = Σ aᵢbᵢ

straighten

Norms — Measuring Length

How “big” is a vector?

The Analogy

The norm of a vector is its length — like measuring the distance from your house to the office on a map. The L2 norm (Euclidean) is the straight-line “as the crow flies” distance. The L1 norm (Manhattan) is the “walking along city blocks” distance — you can only go horizontal or vertical.

Key insight: When AI regularizes a model (L1 or L2 regularization), it’s literally penalizing the “length” of the weight vector. L1 pushes weights to exactly zero (feature selection). L2 keeps weights small but non-zero. Same norms, different effects.

Worked Example

v = np.array([3, 4]) # L2 norm (Euclidean): √(3² + 4²) = √25 = 5 l2 = np.linalg.norm(v) # 5.0 # L1 norm (Manhattan): |3| + |4| = 7 l1 = np.linalg.norm(v, ord=1) # 7.0 # Unit vector (normalize to length 1) unit = v / np.linalg.norm(v) # [0.6, 0.8]

Real World

L2 = crow flies (5 km). L1 = city blocks (7 km).

In AI

L2 regularization shrinks weights. L1 regularization zeros them out.

compare_arrows

Cosine Similarity — Direction Over Distance

Are two vectors pointing the same way?

The Analogy

Two people can be different heights but have the same taste in music. Cosine similarity ignores “how big” the vectors are and only cares about direction. It’s like comparing the angle between two arrows: angle 0° = identical direction (similarity = 1), angle 90° = unrelated (similarity = 0), angle 180° = opposite (similarity = −1).

Key insight: This is exactly how Spotify finds songs similar to your taste. Your listening history is a vector, each song is a vector, and Spotify computes cosine similarity to rank which songs “point in the same direction” as your preferences.

Worked Example

# Cosine similarity = dot(a,b) / (‖a‖ × ‖b‖) a = np.array([1, 2, 3]) b = np.array([2, 4, 6]) # same direction, 2× longer cos_sim = np.dot(a, b) / ( np.linalg.norm(a) * np.linalg.norm(b) ) # 1.0 — perfectly aligned! # Perpendicular vectors c = np.array([1, 0]) d = np.array([0, 1]) # cos_sim = 0 / (1 × 1) = 0.0 — unrelated

Formula: cos(θ) = (a · b) / (‖a‖ × ‖b‖), range [−1, 1]

grid_view

Vector Spaces, Basis & Span

The coordinate system behind everything

The Analogy

A vector space is like a blank canvas with a coordinate grid. The basis vectors are the rulers — they define the grid lines. In 2D, the standard basis is “one step east” [1,0] and “one step north” [0,1]. The span is everywhere you can reach by combining those rulers. If you have 2 independent rulers in 2D, you can reach any point on the canvas.

Key insight: When PCA finds “principal components,” it’s literally choosing new basis vectors — new rulers that align with the directions where your data varies the most. Same space, better coordinate system.

Worked Example

# Standard basis in 2D e1 = np.array([1, 0]) # east e2 = np.array([0, 1]) # north # Any 2D point = combination of basis point = 3 * e1 + 4 * e2 # [3, 4] # Linear independence: can't make e2 from e1 # Dependent: [2,4] = 2 × [1,2] (same line) # Independent: [1,0] and [0,1] (different dirs) # Dimension = number of basis vectors needed # 2D space → 2 basis vectors # 768D embedding → 768 basis vectors

Real World

Map grid: east/north rulers let you reach any point

In AI

PCA picks new rulers aligned with data’s natural directions

dictionary

Word Embeddings — Meaning as Vectors

King − Man + Woman = Queen

The Analogy

Imagine a city where every word has an address. Words with similar meanings live in the same neighborhood: “happy,” “joyful,” and “glad” are on the same street. “Sad” is across town. Word embeddings (Word2Vec, GloVe) assign each word a vector so that the distance between vectors reflects the distance between meanings.

Key insight: The famous equation king − man + woman ≈ queen works because the vector from “man” to “woman” captures a gender direction. Adding that direction to “king” lands you near “queen.” Relationships become vector arithmetic!

Worked Example

from gensim.models import KeyedVectors # Load pre-trained Word2Vec (Google News) wv = KeyedVectors.load_word2vec_format( 'GoogleNews-vectors-negative300.bin' ) # king - man + woman ≈ queen result = wv.most_similar( positive=['king', 'woman'], negative=['man'] ) # → [('queen', 0.7118), ...] # Cosine similarity between words wv.similarity('happy', 'joyful') # ~0.73 wv.similarity('happy', 'sad') # ~0.41

Source: Mikolov et al. (2013) “Efficient Estimation of Word Representations in Vector Space” introduced Word2Vec and demonstrated vector arithmetic on word embeddings.

music_note

Spotify & Real-World Vector Search

How vectors power recommendations at scale

The Analogy

Spotify represents every song and every listener as a high-dimensional vector. Your listening history becomes a vector that captures your taste. Each song has a vector combining audio features (tempo, energy, danceability) and collaborative signals (what similar listeners enjoy). Finding your next favorite song = finding the nearest vector to yours.

Why it matters for AI: Spotify’s Voyager library (2023) performs approximate nearest-neighbor search across millions of song vectors in milliseconds. The same technique powers Google search, Netflix recommendations, and ChatGPT’s retrieval-augmented generation (RAG). Vectors are the universal language of AI similarity.

In Practice

import torch # Simulated: user taste vector & song vectors user = torch.randn(128) # 128-dim taste songs = torch.randn(10000, 128) # 10k songs # Cosine similarity with all songs at once sims = torch.nn.functional.cosine_similarity( user.unsqueeze(0), songs ) # Top 5 recommendations top5 = sims.topk(5).indices # → tensor([4821, 7103, 2294, 9012, 331])

Real World

Find the 5 closest coffee shops to your GPS location

In AI

Find the 5 closest song vectors to your taste vector

Ch 1 — Vectors & Spaces