Ch 2: The YAML Header — Reading Model Cards

Ch 2 — The YAML Header

Metadata that machines read — the 10 fields that define a model’s identity

Index

High Level

code

Structure

arrow_forward

badge

Identity

arrow_forward

category

pipeline_tag and library_name

What task does this model do, and how do I load it?

pipeline_tag

This tells you the model’s primary task. Common values for LLMs: text-generation (autoregressive generation), text2text-generation (encoder-decoder like T5), fill-mask (BERT-style masked language modeling). For other modalities: text-to-image (Stable Diffusion), automatic-speech-recognition (Whisper), image-classification, feature-extraction (embedding models). The pipeline tag also powers the interactive widget on the model page — it tells HF what kind of input box to show.

library_name

How to load the model in code. transformers (Hugging Face’s main library), diffusers (for diffusion models), sentence-transformers (for embeddings), peft (for adapters/LoRA), gguf (for llama.cpp format). This tells you which import to use: a transformers model loads with AutoModelForCausalLM.from_pretrained(), while a gguf model loads with llama.cpp or Ollama.

Key insight: If you see pipeline_tag: text-generation and library_name: transformers, you know immediately: “This is a standard LLM I can load with HF Transformers.” If you see library_name: gguf, you know: “This is for local inference with llama.cpp.”

account_tree

base_model — Tracing the Family Tree

Where this model came from and why lineage matters

What It Tells You

The base_model field links to the parent model this one was derived from. A fine-tuned model will point to its foundation model; a quantized variant will point to the full-precision original. Example: base_model: meta-llama/Llama-3.1-8B tells you this is a derivative of Meta’s Llama 3.1 8B. You can click through to the base model to see its original card, benchmarks, and training data.

Following the Chain

Models often form a chain: Base → Fine-tune → Quantized. For example: meta-llama/Llama-3.1-8B (base) → NousResearch/Hermes-3-Llama-3.1-8B (fine-tune) → bartowski/Hermes-3-Llama-3.1-8B-GGUF (quantized). Each link in the chain inherits the upstream model’s strengths, weaknesses, and license terms. A fine-tune of a Llama model still carries the Llama Community License, regardless of what license the fine-tuner claims.

Key insight: Always follow the base_model chain to the root. The original model’s license and training data disclosures apply to every downstream derivative. A model can’t be “MIT licensed” if its base model has a more restrictive license.

dataset

datasets and tags

What was it trained on, and how can you find it?

datasets

Lists the Hugging Face dataset IDs used for training. Example: datasets: [HuggingFaceTB/cosmopedia, allenai/dolma]. This lets you click through to the actual training data and inspect it. For fine-tuned models, this usually lists the fine-tuning dataset, not the base model’s pre-training data. Watch for models that don’t list any datasets — either the data is proprietary or the model creator didn’t document it.