How It Works
Bag of Words represents a document as a vector of word counts. Each dimension corresponds to a word in the vocabulary, and the value is how many times that word appears. "The cat sat on the mat" becomes a count vector where "the" = 2, "cat" = 1, "sat" = 1, "on" = 1, "mat" = 1. The name "bag" reflects that word order is completely discarded — "dog bites man" and "man bites dog" have identical BoW representations despite opposite meanings. Despite this limitation, BoW is a strong baseline for document-level tasks like topic classification and spam detection, where the presence of certain words matters more than their arrangement. BoW vectors are sparse (mostly zeros) and high-dimensional, but they're fast to compute and easy to interpret.
Bag of Words Example
Document: "the cat sat on the mat"
Vocabulary: [cat, mat, on, sat, the]
BoW vector: [1, 1, 1, 1, 2]
Order lost:
"dog bites man" → [1, 0, 1, 1]
"man bites dog" → [1, 0, 1, 1]
// Identical vectors, opposite meanings!
Strengths:
Simple, fast, interpretable
Good baseline for classification
Works well with Naive Bayes, SVM
Weaknesses:
No word order
No semantic similarity
High dimensionality, sparse
Key insight: BoW works because for many tasks, what words appear matters more than how they're arranged. A movie review containing "terrible", "boring", "waste" is negative regardless of word order. But BoW fails when order carries meaning.