NB’s Advantages
1. Very small training sets: With 50–100 labeled examples, NB often outperforms logistic regression, SVMs, and trees. Its simple model has few parameters to estimate, so it doesn’t overfit.
2. Very high-dimensional data: Text with 50,000+ word features. NB handles this naturally because it estimates each feature independently. No matrix inversions, no curse of dimensionality.
3. Real-time prediction: NB prediction is a simple sum of log-probabilities — O(d) per sample. No matrix multiplications, no tree traversals. It can classify millions of documents per second.
4. Incremental learning: NB can update its model with new data without retraining from scratch (partial_fit() in scikit-learn). Useful for streaming data.
5. Multi-class is natural: NB handles K classes with no extra machinery — just compute P(y=k | x) for each class. No one-vs-rest needed.
NB’s Limitations
1. Probability calibration: NB probabilities are often poorly calibrated — it tends to push probabilities toward 0 and 1. The rankings are usually correct, but the actual probability values are unreliable. Use CalibratedClassifierCV if you need calibrated probabilities.
2. Feature correlations: Highly correlated features get double-counted. If “free” and “money” always appear together, NB treats them as independent evidence, overestimating their combined effect.
3. Continuous features: GaussianNB assumes normal distributions. If features are multimodal or heavily skewed, the Gaussian assumption fails. Consider discretizing features or using kernel density estimation.
Key insight: Naive Bayes is the “good enough, fast enough” classifier. It won’t win a Kaggle competition, but it’ll give you a working baseline in minutes, handle millions of documents without breaking a sweat, and often be within 1–3% of more complex models. In production, “fast and 95% accurate” often beats “slow and 97% accurate.”