unsloth/tests/test_cli_export_unpacking.py

161 lines
5.5 KiB
Python
Raw Normal View History

studio: stream export worker output into the export dialog (#4897) * studio: stream export worker output into the export dialog The Export Model dialog only showed a spinner on the "Exporting..." button while the worker subprocess was doing the actual heavy lifting. For Merged to 16bit and GGUF / Llama.cpp exports this meant several minutes (or more, for large models) of opaque silence, with no way to tell whether save_pretrained_merged, convert_hf_to_gguf.py, or llama-quantize was making progress. This adds a live terminal-style output panel inside the export dialog, rendered just above the Cancel / Start Export buttons and scrollable with auto-follow-tail. It shows stdout and stderr from both the worker process itself and any child process it spawns (GGUF converter, llama-quantize), coloured by stream. Backend - core/export/worker.py: new _setup_log_capture(resp_queue) installed before LogConfig.setup_logging. It saves the original stdout/stderr fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so every child process inherits the redirected fds), and spins up two daemon reader threads. Each thread reads bytes from a pipe, echoes them back to the original fd (so the server console keeps working), splits on \n and \r, and forwards each line to the resp queue as {"type":"log","stream":"stdout|stderr","line":...,"ts":...}. PYTHONUNBUFFERED=1 is set so nested Python converters flush immediately. - core/export/orchestrator.py: - Thread-safe ring buffer (collections.deque, maxlen 4000) with a monotonically increasing seq counter. clear_logs(), get_logs_since(cursor), get_current_log_seq(), is_export_active(). - _wait_response handles rtype == "log" by appending to the buffer and continuing the wait loop. Status messages are also surfaced as a "status" stream so users see high level progress alongside raw subprocess output. - load_checkpoint, _run_export, and cleanup_memory now wrap their bodies with the existing self._lock (previously unused), clear the log buffer at the start of each op, and flip _export_active in a try/finally so the SSE endpoint can detect idle. - routes/export.py: - Wrapped every sync orchestrator call (load_checkpoint, cleanup_memory, export_merged_model, export_base_model, export_gguf, export_lora_adapter) in asyncio.to_thread so the FastAPI event loop stays free during long exports. Without this the new SSE endpoint could not be served concurrently with the blocking export POST. - New GET /api/export/logs/stream SSE endpoint. Honors Last-Event-ID and a since query param for reconnect, emits log / heartbeat / complete / error events, uses the id field to carry the log seq so clients can resume cleanly. On first connect without an explicit cursor it starts from the current seq so old lines from a previous run are not replayed. Frontend - features/export/api/export-api.ts: streamExportLogs() helper that authFetches the SSE endpoint and parses id / event / data fields manually (same pattern as streamTrainingProgress in train-api.ts). - features/export/components/export-dialog.tsx: - Local useExportLogs(exporting) hook that opens the SSE stream on exporting transitions to true, accumulates up to 4000 lines in component state, and aborts on cleanup. - New scrollable output panel rendered above DialogFooter, only shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is a fast disk write with nothing to show). Dark terminal styling (bg-black/85, emerald text, rose for stderr, sky for status), max-height 14rem, auto-scrolls to the bottom on new output but stops following if the user scrolls up. A small streaming / idle indicator is shown next to the panel title. - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the output panel is visible so the logs have room to breathe. Verified - Python smoke test (tests/smoke_export_log_capture.py): spawns a real mp.get_context("spawn") process, installs _setup_log_capture, confirms that parent stdout prints, parent stderr prints, AND a child subprocess invoked via subprocess.run (both its stdout and stderr) are all captured in the resp queue. Passes. - Orchestrator log helpers tested in isolation: _append_log, get_logs_since (with and without a cursor), clear_logs not resetting seq so reconnecting clients still progress. Passes. - routes.export imports cleanly in the studio venv and /logs/stream shows up in router.routes. - bun run build: tsc -b plus vite build, no TypeScript errors. No existing export behavior is changed. If the subprocess, the SSE endpoint, or the frontend hook fails, the export itself still runs to completion the same way it did before, with or without logs visible. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * export dialog: trim bootstrap noise, scope logs per screen, show realpath Several follow-ups to the live export log work: 1. Worker bootstrap noise (transformers venv activation, Unsloth banner, "Top GGUF/hub models" lists, vision detection, 2k-step weight load bar) is dropped from the export-dialog stream. A threading.Event gate in worker.py defaults closed and only opens once _handle_export actually starts; until then the reader thread still echoes lines to the saved console fd for debugging but does not push them onto the resp_queue. The orchestrator already spawns a fresh subprocess for every checkpoint load, so the gate is naturally reset between runs. 2. tqdm in non-tty mode defaults to a 10s mininterval, which makes multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5 in the worker env so any tqdm-driven progress emits more often. 3. The dialog's useExportLogs hook now also clears its line buffer when exportMethod or open changes, so re-opening the dialog into a different action's screen no longer shows the previous action's saved output. A useElapsedSeconds tick + "Working Xs" badge in the log header gives users a visible sign that long single-step phases (cache copies, GGUF conversion) are still running when no new lines are arriving. 4. ExportBackend.export_{merged,base,gguf,lora} now return (success, message, output_path); the worker forwards output_path on each export_*_done response, the orchestrator's _run_export passes it to routes/export.py, which surfaces it via ExportOperationResponse.details.output_path. The dialog's Export Complete screen renders the resolved on-disk realpath under "Saved to" so users can find their exported model directly. * fix(cli): unpack 3-tuple return from export backend ExportOrchestrator.export_{merged,base,gguf,lora} now return (success, message, output_path) so the studio dialog can show the on-disk realpath. The CLI still unpacked 2 values, so every `unsloth export --format ...` crashed with ValueError before reporting completion. Update the four call sites and surface output_path via a "Saved to:" echo. * fix(studio): anchor export log SSE cursor at run start The export dialog SSE defaulted its cursor to get_current_log_seq() at connect time, so any line emitted between the POST that kicks off the export and the client opening the stream was buffered with seqs 1..k and then skipped (seq <= cursor). Long-running exports looked silent during their first seconds. Snapshot _log_seq into _run_start_seq inside clear_logs() and expose it via get_run_start_seq(). The SSE default cursor now uses that snapshot, so every line emitted since the current run began is reachable regardless of when the client connects. Old runs still can't leak in because their seqs are <= the snapshot. * fix(studio): reconnect export log SSE on stream drop useExportLogs launched streamExportLogs once per exporting transition and recorded any drop in .catch(). Long GGUF exports behind a proxy with an idle kill-timeout would silently lose the stream for the rest of the run even though the backend already supports Last-Event-ID resume. The "retry: 3000" directive emitted by the backend is only meaningful to native EventSource; this hook uses a manual fetch + ReadableStream parse so it had no effect. Wrap streamExportLogs in a retry loop that tracks lastSeq from ExportLogEvent.id and passes it as since on reconnect. Backoff is exponential with jitter, capped at 5s, reset on successful open. The loop stops on explicit backend `complete` event or on effect cleanup. * fix(studio): register a second command so Typer keeps `export` as a subcommand The CLI export unpacking tests wrap `unsloth_cli.commands.export.export` in a fresh Typer app with a single registered command. Typer flattens a single-command app into that command, so the test's `runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading `"export"` token as an unexpected extra positional argument -- every parametrized case failed with: Got unexpected extra argument (.../out) Register a harmless `noop` second command so Typer preserves subcommand routing and the tests actually exercise the 3-tuple unpack path they were written to guard. Before: 4 failed After: 4 passed --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: studio-install <studio@local.install> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 15:55:43 +00:00
# SPDX-License-Identifier: AGPL-3.0-only
# Copyright 2026-present the Unsloth AI Inc. team. All rights reserved. See /studio/LICENSE.AGPL-3.0
"""
Regression tests for unsloth_cli.commands.export.
Context: the studio export dialog live-logs work changed
ExportOrchestrator.export_{merged_model,base_model,gguf,lora_adapter}
to return (success, message, output_path) instead of (success, message)
so the frontend can show the on-disk realpath on the success screen.
The CLI at unsloth_cli/commands/export.py still unpacks two values,
so every `unsloth export --format ...` crashes with:
ValueError: too many values to unpack (expected 2)
These tests pin the CLI to the 3-tuple contract by invoking it against
a fake ExportBackend and asserting exit_code == 0 for each --format.
No real ML imports; the fake is installed via sys.modules injection so
the CLI's deferred `from studio.backend.core.export import ExportBackend`
binds to it.
"""
from __future__ import annotations
import sys
import types
from pathlib import Path
import pytest
import typer
from typer.testing import CliRunner
# ---------------------------------------------------------------------------
# Fake ExportBackend
# ---------------------------------------------------------------------------
class _FakeExportBackend:
"""Stand-in for studio.backend.core.export.ExportBackend.
All export_* methods return the new 3-tuple contract. load_checkpoint
keeps its 2-tuple shape (unchanged by the live-logs work).
"""
def __init__(self) -> None:
self.loaded: str | None = None
def load_checkpoint(self, **kwargs):
self.loaded = kwargs.get("checkpoint_path")
return True, f"Loaded {self.loaded}"
def scan_checkpoints(self, **kwargs):
return []
def export_merged_model(self, **kwargs):
return True, "merged ok", str(Path(kwargs["save_directory"]).resolve())
def export_base_model(self, **kwargs):
return True, "base ok", str(Path(kwargs["save_directory"]).resolve())
def export_gguf(self, **kwargs):
return True, "gguf ok", str(Path(kwargs["save_directory"]).resolve())
def export_lora_adapter(self, **kwargs):
return True, "lora ok", str(Path(kwargs["save_directory"]).resolve())
def _install_fake_studio_backend(monkeypatch: pytest.MonkeyPatch) -> None:
"""Inject fake studio.backend.core.export into sys.modules.
The CLI imports ExportBackend lazily inside the command function, so
patching sys.modules before invoking the command is sufficient to
steer the `from studio.backend.core.export import ExportBackend`
statement at the fake. Parent packages (studio, studio.backend,
studio.backend.core) are stubbed too so Python's import machinery
doesn't try to resolve the real (structlog-dependent) tree.
"""
for name in ("studio", "studio.backend", "studio.backend.core"):
monkeypatch.setitem(sys.modules, name, types.ModuleType(name))
fake_mod = types.ModuleType("studio.backend.core.export")
fake_mod.ExportBackend = _FakeExportBackend
monkeypatch.setitem(sys.modules, "studio.backend.core.export", fake_mod)
# Drop any cached import of the CLI module so the deferred import
# inside export() re-resolves against our fake module rather than a
# previously cached real one.
monkeypatch.delitem(sys.modules, "unsloth_cli.commands.export", raising = False)
@pytest.fixture
def cli_app(monkeypatch: pytest.MonkeyPatch) -> typer.Typer:
"""Typer app wrapping unsloth_cli.commands.export.export."""
_install_fake_studio_backend(monkeypatch)
from unsloth_cli.commands import export as export_cmd
app = typer.Typer()
app.command("export")(export_cmd.export)
# Typer flattens a single-command app into that command, which would
# make argv[0] ("export") look like an extra positional argument to
# the test invocation. Register a harmless second command so Typer
# keeps "export" as a real subcommand and the tests drive the
# intended code path.
@app.command("noop")
def _noop() -> None: # pragma: no cover - only exists to pin routing
pass
return app
@pytest.fixture
def runner() -> CliRunner:
return CliRunner()
# ---------------------------------------------------------------------------
# The actual regression tests
# ---------------------------------------------------------------------------
@pytest.mark.parametrize(
"format_flag,quant_flag",
[
("merged-16bit", None),
("merged-4bit", None),
("gguf", "q4_k_m"),
("lora", None),
],
)
def test_cli_export_unpacks_three_tuple(
cli_app: typer.Typer,
runner: CliRunner,
tmp_path: Path,
format_flag: str,
quant_flag: str | None,
) -> None:
"""Each --format path must unpack (success, message, output_path)
without raising ValueError. Pre-fix, every parametrized case fails
with 'too many values to unpack (expected 2)'.
"""
ckpt = tmp_path / "ckpt"
ckpt.mkdir()
out = tmp_path / "out"
cli_args = ["export", str(ckpt), str(out), "--format", format_flag]
if quant_flag is not None:
cli_args += ["--quantization", quant_flag]
result = runner.invoke(cli_app, cli_args)
assert result.exit_code == 0, (
f"CLI exited with code {result.exit_code} for --format {format_flag}.\n"
f"Output:\n{result.output}\n"
f"Exception: {result.exception!r}"
)
# Sanity: the success message from the fake backend should reach stdout.
expected_prefix = format_flag.split("-")[0]
assert f"{expected_prefix} ok" in result.output