ChatML Format Guide

Chat Markup Language for structured LLM conversations

Conversation

Specification

ChatML (Chat Markup Language) is a structured text format for representing multi-turn conversations with large language models. Originally introduced by OpenAI and adopted across the industry, ChatML uses special tokens to delimit message boundaries and role identifiers, providing an unambiguous way to encode conversations into a single token sequence for model training and inference. The format ensures that the model can reliably distinguish between system instructions, user inputs, and assistant responses, even when message content contains text that might otherwise be confused with structural elements.

The ChatML format wraps each message in special token delimiters. Each message begins with a start token followed by the role identifier and a newline, contains the message body, and ends with an end token. The standard roles are "system" (behavioral instructions), "user" (human input), and "assistant" (model response). Some implementations extend this with additional roles like "tool" for function calling results or "ipython" for code execution output. The special tokens (typically rendered as <|im_start|> and <|im_end|>) are added to the model's vocabulary during training and are never generated as part of regular text output.

ChatML has become the default chat template for many open-source models including Qwen, Yi, and models fine-tuned with the OpenAI-compatible format. The Hugging Face Transformers library uses Jinja2-based chat templates to convert between structured message lists and ChatML-formatted text, with each model's tokenizer specifying its own template. This standardization enables consistent behavior across different model providers and inference frameworks, as the structured format removes ambiguity about where one message ends and another begins.

When to Use ChatML

Use ChatML when your target model uses the ChatML chat template — this includes Qwen-family models, Yi-family models, and many community fine-tunes based on the OpenAI conversation format. If your training framework (Axolotl, LLaMA-Factory, TRL) expects you to specify a chat template and your base model was pre-trained or instruction-tuned with ChatML, you should prepare your training data in ChatML format or a structured format that can be converted to ChatML.

Choose ChatML over raw text or ad-hoc prompt formats when you need unambiguous message boundary detection. The special token delimiters ensure that the model never confuses message content with structural markers, which is critical for security (preventing prompt injection) and reliability (ensuring proper turn-taking in multi-turn conversations). ChatML is also the preferred format when your deployment uses the OpenAI-compatible API format, as the structured message list maps directly to ChatML encoding.

ChatML is less relevant when your target model uses a different chat template (LLaMA uses a different format, Mistral uses its own tokens). In those cases, use the model's native template. The format is also unnecessary for single-turn tasks where simpler formats like Alpaca suffice, or when your training framework handles template application automatically from structured conversation data (ShareGPT format).

Schema / Structure

text

ChatML Token Structure:
<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant
{assistant_message}<|im_end|>

Special Tokens:
  <|im_start|>  - Message start delimiter
  <|im_end|>    - Message end delimiter

Roles:
  system     - Behavioral instructions (optional, first message)
  user       - Human/user input
  assistant  - Model/AI response
  tool       - Tool/function call result (extended format)

Jinja2 Chat Template (HuggingFace):
{% for message in messages %}
<|im_start|>{{ message.role }}
{{ message.content }}<|im_end|>
{% endfor %}
<|im_start|>assistant

ChatML format specification with special tokens, roles, and Jinja2 template

Example Data

text

<|im_start|>system
You are a senior software engineer who provides concise, practical code reviews.<|im_end|>
<|im_start|>user
Please review this Python function:

def get_user(id):
    conn = sqlite3.connect('app.db')
    cursor = conn.cursor()
    cursor.execute(f"SELECT * FROM users WHERE id = {id}")
    return cursor.fetchone()<|im_end|>
<|im_start|>assistant
This function has several critical issues:

1. **SQL Injection vulnerability**: Using f-string interpolation in SQL queries is dangerous. Use parameterized queries instead:
```python
cursor.execute("SELECT * FROM users WHERE id = ?", (id,))
```

2. **Resource leak**: The database connection is never closed. Use a context manager:
```python
with sqlite3.connect('app.db') as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM users WHERE id = ?", (id,))
    return cursor.fetchone()
```

3. **Hardcoded database path**: Move 'app.db' to a configuration variable.

4. **No error handling**: Add try/except for database errors and handle the case where no user is found.<|im_end|>
<|im_start|>user
Good points. Can you show the corrected version?<|im_end|>
<|im_start|>assistant
```python
import sqlite3
from typing import Optional, Tuple

def get_user(user_id: int, db_path: str = "app.db") -> Optional[Tuple]:
    try:
        with sqlite3.connect(db_path) as conn:
            cursor = conn.cursor()
            cursor.execute("SELECT * FROM users WHERE id = ?", (user_id,))
            return cursor.fetchone()
    except sqlite3.Error as e:
        logging.error(f"Database error fetching user {user_id}: {e}")
        return None
```

Key improvements: parameterized query, context manager, type hints, configurable DB path, error handling, and renamed parameter to avoid shadowing the built-in `id`.<|im_end|>

ChatML-formatted multi-turn code review conversation with system prompt

Ertas Support

Ertas Data Suite supports ChatML as both an import and export format for conversational training data. You can import ChatML-formatted text datasets, apply PII redaction while preserving the special token structure, and export processed conversations with intact message boundaries. The platform validates ChatML structure including proper token pairing, valid role identifiers, and correct message alternation patterns.

Ertas Studio handles ChatML template application automatically when fine-tuning models that use the ChatML chat template, converting structured conversation data to properly tokenized ChatML sequences during training data preparation.

Related Resources

Glossary

Chat Template

Ship AI that runs on your users' devices.

Early bird pricing starts at $14.50/mo — locked in for life. Plans for builders and agencies.

View early bird pricing or join the waitlist →