tokenizer.json as a JSON object under the model.vocab key. The vocab_size (128,256) directly determines the embedding matrix's first dimension.merges list in tokenizer.json defines these merges in order of frequency. When tokenizing new text, the algorithm applies these merges greedily: first split into bytes, then apply merge 1 if the pair exists, then merge 2, etc. This is why any text can be tokenized — worst case, it falls back to individual bytes.<|begin_of_text|> (start of conversation), <|eot_id|> (end of turn), and <|start_header_id|> / <|end_header_id|> (role markers). These tokens occupy IDs at the end of the vocabulary (128000+), reserved specifically for control purposes.<|eot_id|> means "stop generating." If your inference code doesn't include it as a stop token, the model will keep generating past its turn boundary.tokenizer_config.json is a Jinja2 template that defines how multi-turn conversations are formatted before being fed to the model. It wraps each message with the correct special tokens and role markers. This template is the contract between the user and the model — using the wrong template produces gibberish even with perfect weights, because the model was trained on a specific conversation format.tokenizer.json file contains six key sections: normalizer (text cleanup before tokenizing), pre_tokenizer (how to split text into pre-token chunks, e.g., by whitespace), model (the BPE vocabulary and merge rules), post_processor (adds special tokens like BOS/EOS), decoder (converts token IDs back to text), and added_tokens (special tokens injected into the vocabulary).tokenizer_config.json contains operational settings: which special tokens to use for BOS/EOS/PAD, whether to add BOS token automatically, the maximum sequence length, and crucially, the chat_template string (the Jinja2 template). It also specifies the tokenizer_class (e.g., "PreTrainedTokenizerFast") which tells the framework which tokenizer implementation to use.