Packaging Options
A “model” in production is more than just weights — it includes preprocessing logic, tokenizers, configuration, and dependencies. Common packaging formats: MLflow Model (framework-agnostic, includes MLmodel descriptor + conda.yaml), ONNX (Open Neural Network Exchange — convert PyTorch/TF models to a portable format for optimized inference), TorchScript (serialized PyTorch models that can run without Python), SavedModel (TensorFlow’s native format), and Docker containers (package everything including the runtime). For LLMs, models are typically served via specialized inference servers (vLLM, TGI) rather than generic packaging.
MLflow Model Format
# MLflow model directory structure
fraud-detector/
├── MLmodel # metadata + flavors
├── conda.yaml # environment
├── requirements.txt # pip deps
├── python_model.pkl # or model.pt
└── artifacts/
└── preprocessor.pkl
# MLmodel file:
artifact_path: model
flavors:
python_function:
loader_module: mlflow.pytorch
python_version: 3.11.0
pytorch:
model_data: model.pt
pytorch_version: 2.3.0
# Load anywhere:
# mlflow.pyfunc.load_model("path/to/model")
Key insight: MLflow’s “flavors” system lets you save a model in its native format (PyTorch, sklearn, etc.) while also providing a generic pyfunc interface. This means any MLflow model can be loaded and served the same way, regardless of framework.