What is Chat Template?

A formatting structure that defines how conversational messages (system, user, assistant) are tokenized and arranged as input to a language model.

Definition

A chat template is a predefined text format that structures multi-turn conversations into the token sequence a language model expects. Each model family has its own chat template that specifies how to delimit system instructions, user messages, and assistant responses using special tokens and formatting conventions. For example, Llama models use tokens like [INST] and [/INST] to wrap user turns, while ChatML (used by Mistral and others) uses <|im_start|> and <|im_end|> markers with role labels.

Chat templates are essential because language models are fundamentally next-token predictors — they have no inherent concept of "roles" or "turns." The chat template encodes this conversational structure into the flat token stream that the model actually processes. If the wrong template is used, the model cannot distinguish between user messages and assistant responses, leading to incoherent or role-confused outputs. This is one of the most common but easily avoidable mistakes in LLM deployment.

Modern frameworks like Hugging Face Transformers store chat templates as Jinja2 templates in the tokenizer configuration, enabling automatic formatting via the `apply_chat_template()` method. When preparing fine-tuning data for conversational models, the training pipeline must apply the correct chat template to each example so the model learns the expected format. Mismatched templates between training and inference are a frequent source of degraded performance in fine-tuned models.

Why It Matters

Using the correct chat template is a prerequisite for getting coherent outputs from any conversational model. It is also critical for fine-tuning: training data must be formatted with the same template the model was pre-trained with, or the model will learn a conflicting format that confuses it at inference time. For teams working with multiple model families, understanding chat templates prevents subtle bugs that manifest as models ignoring instructions, mixing up roles, or generating malformed responses.

How It Works

When a user sends a conversation to a model, the chat template processor takes the structured list of messages — each with a role (system, user, or assistant) and content — and renders them into a single string with the appropriate special tokens. For example, in ChatML format, a system message becomes: <|im_start|>system\nYou are a helpful assistant.<|im_end|>\n. The tokenizer then converts this formatted string into token IDs. During fine-tuning, the same template is applied to each training example's conversation, and loss is typically computed only on the assistant tokens (not the user or system tokens) so the model learns to generate responses rather than prompts.

Example Use Case

A team fine-tunes a Mistral 7B model on customer support conversations but accidentally formats their training data using Llama's chat template instead of ChatML. The fine-tuned model produces confused outputs that mix user and assistant roles. After reformatting the training data with the correct ChatML template and retraining, the model properly distinguishes roles and produces coherent, well-structured responses. The fix costs them only a few hours of retraining but saves weeks of debugging.

Key Takeaways

Chat templates encode conversational structure (roles, turns) into the flat token sequence models expect.
Each model family has its own chat template — using the wrong one produces incoherent outputs.
Training data must use the same chat template as the base model's pre-training format.
Modern frameworks store templates as Jinja2 in the tokenizer config for automatic formatting.
Loss masking during fine-tuning typically excludes non-assistant tokens to focus learning on response generation.

How Ertas Helps

Ertas Studio automatically applies the correct chat template for whichever base model the user selects, eliminating one of the most common fine-tuning pitfalls. When users upload JSONL training data with structured messages, Ertas formats each example according to the model's expected template before tokenization. This means users can work with clean, human-readable message arrays and let the platform handle the model-specific formatting details.