BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4426)

Fixes #4150.

Pre-PR, `_fix_chat_template` only patched templates where a trailing `{{ ... }}` expression followed the last `{% endfor %}`. ChatML templates (Hermes, Magnum, Phi-4, etc.) that end cleanly at `{% endfor %}` with no generation-prompt block were left unchanged, so the outer `fix_chat_template` raised:

```
RuntimeError: Unsloth: The tokenizer `...` does not have a
{% if add_generation_prompt %} for generation purposes.
```

This commonly shows up when a downstream tool (LlamaFactory, Axolotl) re-serializes the tokenizer during LoRA save and strips the generation-prompt block.

This PR adds a second branch to `_fix_chat_template` that fires when:

- the content after the last `{% endfor %}` is empty modulo Jinja `{# ... #}` comments,
- the scrubbed template contains `<|im_start|>` and `<|im_end|>`,
- and the scrubbed template does not already mention `add_generation_prompt`.

The assistant-turn separator is inferred from the template itself (preferring an explicit `'<|im_start|>assistant<sep>'` literal, then the unique `message['role'] + '<sep>'` from role concatenations, then `<|im_sep|>` for Phi-4-mini mixed-separator templates, then `\n`), so Phi-4-style templates are not silently corrupted with the wrong separator.

Verified against the existing chat-template corpus:

- Hermes-3, Magnum-v2, Phi-4-mini, Phi-4 multi-sep, ChatML with trailing whitespace, ChatML with trailing Jinja comment, dot-access `message.role`, split-literal `'<|im_start|>assistant'`: all repaired with the correct assistant prefix.
- Already-fixed ChatML templates: idempotent NOP.
- Trap templates with `<|im_start|>` only inside a Jinja comment: correctly not rewritten.
- Llama-3, Gemma-3, Qwen2.5 (non-ChatML): byte-identical.
- Mistral family (5 models including Mistral-Nemo, Mistral-Small-24B, Mixtral): byte-identical, protected both by the structural guard (no ChatML tokens) and the existing name-based exemption in `load_correct_tokenizer`.
- Qwen family (14 models including Qwen2.5, Qwen3, Qwen3-Coder, QwQ, VL, Math, Qwen3-Guard): byte-identical.

End-to-end reproduction: Hermes-3 LoRA SFT, save with stripped chat_template, reload. Pre-PR code path raises the RuntimeError above. Post-PR reload loads cleanly, patches the template at load time, and `apply_chat_template(add_generation_prompt=True)` produces the correct `<|im_start|>assistant\n` prefix.
This commit is contained in:
Imgyu Kim 2026-04-16 16:21:29 +09:00 committed by GitHub
parent a4d4dfe4ac
commit 14ab6fbfae
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -677,6 +677,54 @@ def _fix_chat_template(chat_template):
)
chat_template = chat_template[: where + len(chosen_end)] + after_endfor
elif re.sub(r"\{#.*?#\}", "", after_endfor, flags = re.DOTALL).strip() == "":
# GH#4150: ChatML templates ending at {% endfor %} without an
# add_generation_prompt block. Scrub Jinja `{# ... #}` comments so
# tokens inside comments cannot fool the guard below.
scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags = re.DOTALL)
if (
"<|im_start|>" in scrubbed
and "<|im_end|>" in scrubbed
and "add_generation_prompt" not in scrubbed
):
# Infer the assistant-turn separator. Prefer an explicit
# '<|im_start|>assistant<sep>' literal; else the unique
# `message['role'] + '<sep>'` from role concatenations; else
# '<|im_sep|>' if present (Phi-4-mini uses '\n' for system and
# '<|im_sep|>' for user/assistant); else '\n'.
assistant_match = re.search(
r"""(['"])<\|im_start\|>assistant([^'"]*)\1""",
scrubbed,
)
role_seps = [
m.group(2)
for m in re.finditer(
r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
scrubbed,
)
]
unique_role_seps = list(dict.fromkeys(role_seps))
if assistant_match is not None and assistant_match.group(2):
separator = assistant_match.group(2)
elif len(unique_role_seps) == 1:
separator = unique_role_seps[0]
elif "<|im_sep|>" in scrubbed:
separator = "<|im_sep|>"
else:
separator = "\\n"
# Emit a double-quoted Jinja literal so a single quote in the
# separator cannot break the block. Drop trailing whitespace/
# comments after endfor: they would render as stray output
# after the generation prefix.
assistant_prefix = "<|im_start|>assistant" + separator
generation_block = (
"{%" + dash + " if add_generation_prompt %}"
'{{ "' + assistant_prefix.replace('"', '\\"') + '" }}'
"{%" + dash + " endif %}"
)
chat_template = chat_template[: where + len(chosen_end)] + generation_block
return chat_template