Why Merge Models?
The idea: You fine-tuned Model A for coding and Model B for medical Q&A. Merging combines their strengths into a single model without retraining. No GPU needed — merging is a CPU operation on the weight tensors.
Real-world impact: Merged models regularly top the Open LLM Leaderboard. The community has turned merging into a “sport” — sharing recipes and discovering that certain merge combinations outperform the individual models.
When to merge: You have multiple fine-tunes of the same base model, each specialized for a different task, and you want a single generalist model that handles all tasks.
Merging Methods
SLERP (Spherical Linear Interpolation): Smoothly interpolates between two models while preserving magnitude in high-dimensional weight space. Most popular method. Limited to 2 models at a time. One parameter: t (0.0–1.0, blend ratio).
TIES-Merging (Yadav et al. 2023): Handles 2+ models. Trims redundant parameters, resolves sign conflicts between models, then averages. Key parameter: density (0.2–1.0, how much to trim).
DARE (Yu et al. 2024): Randomly drops delta parameters before merging as regularization. Enables blending more models without interference. Works well with TIES.
Linear: Simple weighted average. merged = w1*A + w2*B. Fast but less sophisticated. Good baseline.
mergekit (Arcee AI) is the standard tool. Define a YAML config specifying models, method, and parameters. Run mergekit-yaml config.yaml ./output. Supports SLERP, TIES, DARE, linear, passthrough, and per-layer recipes. Works on CPU — no GPU required for merging.