BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4426)

Fixes #4150. Pre-PR, `_fix_chat_template` only patched templates where a trailing `{{ ... }}` expression followed the last `{% endfor %}`. ChatML templates (Hermes, Magnum, Phi-4, etc.) that end cleanly at `{% endfor %}` with no generation-prompt block were left unchanged, so the outer `fix_chat_template` raised: ``` RuntimeError: Unsloth: The tokenizer `...` does not have a {% if add_generation_prompt %} for generation purposes. ``` This commonly shows up when a downstream tool (LlamaFactory, Axolotl) re-serializes the tokenizer during LoRA save and strips the generation-prompt block. This PR adds a second branch to `_fix_chat_template` that fires when: - the content after the last `{% endfor %}` is empty modulo Jinja `{# ... #}` comments, - the scrubbed template contains `<|im_start|>` and `<|im_end|>`, - and the scrubbed template does not already mention `add_generation_prompt`. The assistant-turn separator is inferred from the template itself (preferring an explicit `'<|im_start|>assistant<sep>'` literal, then the unique `message['role'] + '<sep>'` from role concatenations, then `<|im_sep|>` for Phi-4-mini mixed-separator templates, then `\n`), so Phi-4-style templates are not silently corrupted with the wrong separator. Verified against the existing chat-template corpus: - Hermes-3, Magnum-v2, Phi-4-mini, Phi-4 multi-sep, ChatML with trailing whitespace, ChatML with trailing Jinja comment, dot-access `message.role`, split-literal `'<|im_start|>assistant'`: all repaired with the correct assistant prefix. - Already-fixed ChatML templates: idempotent NOP. - Trap templates with `<|im_start|>` only inside a Jinja comment: correctly not rewritten. - Llama-3, Gemma-3, Qwen2.5 (non-ChatML): byte-identical. - Mistral family (5 models including Mistral-Nemo, Mistral-Small-24B, Mixtral): byte-identical, protected both by the structural guard (no ChatML tokens) and the existing name-based exemption in `load_correct_tokenizer`. - Qwen family (14 models including Qwen2.5, Qwen3, Qwen3-Coder, QwQ, VL, Math, Qwen3-Guard): byte-identical. End-to-end reproduction: Hermes-3 LoRA SFT, save with stripped chat_template, reload. Pre-PR code path raises the RuntimeError above. Post-PR reload loads cleanly, patches the template at load time, and `apply_chat_template(add_generation_prompt=True)` produces the correct `<|im_start|>assistant\n` prefix.
2026-04-21 13:37:39 +00:00 · 2026-04-16 16:21:29 +09:00 · 2026-04-16 16:21:29 +09:00 · 14ab6fbfae
commit 14ab6fbfae
parent a4d4dfe4ac
1 changed files with 48 additions and 0 deletions
--- a/unsloth/tokenizer_utils.py
+++ b/unsloth/tokenizer_utils.py
@ -677,6 +677,54 @@ def _fix_chat_template(chat_template):
        )

        chat_template = chat_template[: where + len(chosen_end)] + after_endfor
+
+    elif re.sub(r"\{#.*?#\}", "", after_endfor, flags = re.DOTALL).strip() == "":
+        # GH#4150: ChatML templates ending at {% endfor %} without an
+        # add_generation_prompt block. Scrub Jinja `{# ... #}` comments so
+        # tokens inside comments cannot fool the guard below.
+        scrubbed = re.sub(r"\{#.*?#\}", "", chat_template, flags = re.DOTALL)
+        if (
+            "<|im_start|>" in scrubbed
+            and "<|im_end|>" in scrubbed
+            and "add_generation_prompt" not in scrubbed
+        ):
+            # Infer the assistant-turn separator. Prefer an explicit
+            # '<|im_start|>assistant<sep>' literal; else the unique
+            # `message['role'] + '<sep>'` from role concatenations; else
+            # '<|im_sep|>' if present (Phi-4-mini uses '\n' for system and
+            # '<|im_sep|>' for user/assistant); else '\n'.
+            assistant_match = re.search(
+                r"""(['"])<\|im_start\|>assistant([^'"]*)\1""",
+                scrubbed,
+            )
+            role_seps = [
+                m.group(2)
+                for m in re.finditer(
+                    r"""message(?:\[['"]role['"]\]|\.role)\s*\+\s*(['"])([^'"]*)\1""",
+                    scrubbed,
+                )
+            ]
+            unique_role_seps = list(dict.fromkeys(role_seps))
+            if assistant_match is not None and assistant_match.group(2):
+                separator = assistant_match.group(2)
+            elif len(unique_role_seps) == 1:
+                separator = unique_role_seps[0]
+            elif "<|im_sep|>" in scrubbed:
+                separator = "<|im_sep|>"
+            else:
+                separator = "\\n"
+            # Emit a double-quoted Jinja literal so a single quote in the
+            # separator cannot break the block. Drop trailing whitespace/
+            # comments after endfor: they would render as stray output
+            # after the generation prefix.
+            assistant_prefix = "<|im_start|>assistant" + separator
+            generation_block = (
+                "{%" + dash + " if add_generation_prompt %}"
+                '{{ "' + assistant_prefix.replace('"', '\\"') + '" }}'
+                "{%" + dash + " endif %}"
+            )
+            chat_template = chat_template[: where + len(chosen_end)] + generation_block
+
    return chat_template