SafeTensors (.safetensors)
Developed by Hugging Face. Safe (no arbitrary code execution unlike pickle-based .pt files). Zero-copy memory mapping for fast loading. The standard for storing and sharing full-precision model weights. Used for training, cloud serving, and fine-tuning.
PyTorch .bin (legacy)
The original format — uses Python pickle serialization. Security risk: malicious .bin files can execute arbitrary code. Being phased out in favor of SafeTensors. Most models on HuggingFace now offer SafeTensors versions.
GGUF
GPT-Generated Unified Format (created by llama.cpp's ggerganov). Single file containing quantized weights + model metadata + tokenizer. Optimized for CPU inference. Supports Q4, Q5, Q8 quantization levels. The standard for local AI with Ollama and llama.cpp.
Rule of thumb: Use SafeTensors for training, cloud GPUs, and fine-tuning. Use GGUF for local inference, Ollama, and llama.cpp on consumer hardware. Conversion tools exist in both directions.