README.md file. At the very top, between two lines of ---, sits a block of YAML (Yet Another Markup Language). This is the structured metadata that Hugging Face parses automatically. Everything below the second --- is free-form Markdown for humans. The YAML block is what powers search, filtering, the inference widget, and the sidebar badges you see on every model page.license and languageapache-2.0 (fully permissive, commercial OK), mit (permissive), llama3.1 (Meta’s community license with usage restrictions above 700M monthly users), gemma (Google’s terms with prohibited use cases), cc-by-nc-4.0 (non-commercial only). If you’re building a product, this is the first field to check. A model with cc-by-nc-4.0 cannot be used commercially, period.en for English, zh for Chinese, fr for French. A model listing [en, de, fr] was trained on those languages. A model listing only [en] may produce garbage output in other languages, even if it technically generates text. Multilingual models typically list many codes or use multilingual as a tag.pipeline_tag and library_nametext-generation (autoregressive generation), text2text-generation (encoder-decoder like T5), fill-mask (BERT-style masked language modeling). For other modalities: text-to-image (Stable Diffusion), automatic-speech-recognition (Whisper), image-classification, feature-extraction (embedding models). The pipeline tag also powers the interactive widget on the model page — it tells HF what kind of input box to show.transformers (Hugging Face’s main library), diffusers (for diffusion models), sentence-transformers (for embeddings), peft (for adapters/LoRA), gguf (for llama.cpp format). This tells you which import to use: a transformers model loads with AutoModelForCausalLM.from_pretrained(), while a gguf model loads with llama.cpp or Ollama.pipeline_tag: text-generation and library_name: transformers, you know immediately: “This is a standard LLM I can load with HF Transformers.” If you see library_name: gguf, you know: “This is for local inference with llama.cpp.”base_model — Tracing the Family Treebase_model field links to the parent model this one was derived from. A fine-tuned model will point to its foundation model; a quantized variant will point to the full-precision original. Example: base_model: meta-llama/Llama-3.1-8B tells you this is a derivative of Meta’s Llama 3.1 8B. You can click through to the base model to see its original card, benchmarks, and training data.meta-llama/Llama-3.1-8B (base) → NousResearch/Hermes-3-Llama-3.1-8B (fine-tune) → bartowski/Hermes-3-Llama-3.1-8B-GGUF (quantized). Each link in the chain inherits the upstream model’s strengths, weaknesses, and license terms. A fine-tune of a Llama model still carries the Llama Community License, regardless of what license the fine-tuner claims.base_model chain to the root. The original model’s license and training data disclosures apply to every downstream derivative. A model can’t be “MIT licensed” if its base model has a more restrictive license.datasets and tagsdatasets: [HuggingFaceTB/cosmopedia, allenai/dolma]. This lets you click through to the actual training data and inspect it. For fine-tuned models, this usually lists the fine-tuning dataset, not the base model’s pre-training data. Watch for models that don’t list any datasets — either the data is proprietary or the model creator didn’t document it.chat (instruction-tuned for conversation), code (trained on code), math (math-focused), gguf, 4bit, lora. Tags are not validated — anyone can add any tag. They’re useful for broad filtering but should not be trusted as ground truth. Cross-check tags against the actual card content.datasets field is where transparency lives. A model that lists its training data lets you assess data quality, check for contamination (did they train on the benchmark test set?), and understand domain coverage. Opaque training data is a risk factor.model-index — Automated Evaluation Resultsmodel-index field embeds benchmark results directly in the metadata. Hugging Face uses a decentralized evaluation system: results are stored as YAML files in an .eval_results/ folder in the model repo. These results appear automatically on the model page with badges showing their provenance — “verified” (ran on HF infrastructure), “community” (submitted via PR), or “leaderboard” (from the Open LLM Leaderboard).pipeline_tag. Filter by “English”? That’s language. Filter by “Apache 2.0”? That’s license. Filter by “transformers”? That’s library_name. A model with no YAML metadata is a model that doesn’t appear in any filtered search.pipeline_tag. If the tag says text-generation, you get a text input. If it says text-to-image, you get an image generation interface. If it says automatic-speech-recognition, you get a file upload for audio. No pipeline_tag = no widget = no way to test the model in-browser before downloading.license — Can I use this? (If non-commercial and you need commercial, stop.)pipeline_tag — Is this the right task type?language — Does it support my language?base_model — What family does it belong to?library_name — Can I load it with my stack?datasets — What was it trained on?tags — Any useful context?