The Generative Twist
A standard autoencoder maps inputs to specific points in latent space. But what if you want to generate new data? You’d need to sample from the latent space, but the space has no structure — there are gaps and irregular regions. The Variational Autoencoder (VAE) (Kingma & Welling, 2014) solves this by making the encoder output a probability distribution (mean μ and variance σ²) instead of a single point. The latent code z is sampled from this distribution. A KL divergence loss regularizes the distribution to be close to a standard normal N(0, 1), ensuring the latent space is smooth and continuous.
VAE Loss
// VAE loss = reconstruction + regularization
L = E[||x - x̂||²] // reconstruction
+ KL(q(z|x) || p(z)) // regularization
// Encoder outputs μ and log(σ²)
// Reparameterization trick:
z = μ + σ · ε, ε ~ N(0, 1)
// Allows backprop through sampling
// KL divergence (closed form for Gaussians):
KL = -0.5 · Σ(1 + log(σ²) - μ² - σ²)
Key insight: The reparameterization trick is the key innovation — it moves the randomness outside the network (ε is sampled, not z), allowing gradients to flow through the encoder. Without this trick, you can’t backpropagate through a sampling operation.