memory

Small Models & Local AI

Quantization, distillation, Ollama, edge deployment — run AI on your own hardware
Co-Created by Kiran Shirol and Claude
TopicsQuantizationDistillationOllamaEdge DeployLocal Apps
home Learning Portal play_arrow Start Learningsummarize Key Insightsdictionary Glossary10 chapters · 5 sections
Section 1

Foundation — Why Small?

The cost, latency, and privacy case for running models locally.
Section 2

Core Techniques — Making Models Smaller

Quantization and distillation techniques to shrink without breaking.
Section 3

Hands-On — Running Models Locally

Ollama, llama.cpp, and GGUF in practice.
Section 4

Real-World Applications

Building local apps and deploying to phones, browsers, and IoT.
Section 5

Strategy — Choosing Wisely

Local vs cloud decisions and the future of small models.