unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

Author	SHA1	Message	Date
Daniel Han	1ccfd2e0a5	fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033 ) * fix(rocm): tighten gfx regex to ignore generic ISA lines ROCm 6.1+ rocminfo emits generic ISA names such as "amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic" alongside the real GPU name. The previous `gfx[1-9]` regex used in `_has_rocm_gpu` matched both, so a host with only a generic ISA entry would be reported as having a usable AMD GPU. Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids match. This covers every documented target from GFX6 (gfx600) through GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 / MI250X) and gfx90c. Documented generic ISA names always have 1 or 2 digits before the dash and no longer match. Applied to both `studio/install_python_stack.py` and `studio/install_llama_prebuilt.py` so the two detection paths agree. Co-authored-by: Martin Hoyer <mhoyer@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Martin Hoyer <mhoyer@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:24:41 -07:00
Daniel Han	b7a8ff2833	Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027 ) (#5034 ) * Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) FastLanguageModel.from_pretrained(..., num_labels=N) crashed with "NotImplementedError: normal_kernel_cuda not implemented for 'Byte'" on pre-quantized bnb 4-bit checkpoints (e.g. unsloth/Qwen3-4B-bnb-4bit) when running on transformers 5.x. Two pieces were needed to close this out: 1. unsloth_zoo PR: add "score", "classifier", "qa_outputs" to SKIP_QUANTIZATION_MODULES so replace_with_bnb_linear leaves task heads in the compute dtype. 2. This commit: for pre-quantized checkpoints, transformers reads llm_int8_skip_modules from the quantization_config baked into config.json and ignores the runtime BitsAndBytesConfig we pass via kwargs. Unsloth must merge its skip list into model_config.quantization_config.llm_int8_skip_modules before the from_pretrained call, or the checkpoint's frozen list (e.g. ["lm_head", "multi_modal_projector", "merger", "modality_projection"]) wins and the `score` head gets converted to Linear4bit with uint8 storage, then _init_weights calls normal_ on uint8 and crashes. Also add a defensive post-load cast on the task head to guard against any residual path that ends up with a non-floating head dtype. Verified on transformers 4.57.6 and 5.5.0 with: - unsloth/Qwen3-4B-bnb-4bit + num_labels=3 - unsloth/Qwen3-4B (non-bnb repo, load_in_4bit=True) - unsloth/Llama-3.2-1B-Instruct + num_labels=3 - unsloth/ModernBERT-large classifier head (bert_classification notebook) - Regression: causal LM path unchanged, backbone still 4-bit - 3-step SFT on num_labels=3 confirms gradient flow and weight updates on score.weight Fixes unslothai/unsloth#5027 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:16:33 -07:00
David Solanas Sanz	1fcb2502cf	fix: prevent offline freeze by fixing stats retry and forwarding local_files_only (#5016 ) Fixes #2393. - `_utils.py`: `has_internet()` now respects `HF_HUB_OFFLINE` with truthy variant parsing in addition to `TRANSFORMERS_OFFLINE`. - `_utils.py`: replace uncontrolled `except Exception: stats_check()` retry (which had no time limit and could freeze on Kaggle offline mode) with a logged skip. - `loader.py`: forward `local_files_only` from kwargs into all `AutoConfig.from_pretrained` and `PeftConfig.from_pretrained` probes in `FastLanguageModel.from_pretrained` and `FastModel.from_pretrained`, including the PEFT base-model reload paths.	2026-04-15 04:51:31 -07:00
Lee Jackson	f9ef639dde	Studio: support GGUF variant selection for non-suffixed repos (#5023 ) * fix: support GGUF variant selection for non-suffixed repos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: harden GGUF detection across cached models and picker flows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: use shared GGUF picker helper for search rows * fix: avoid mixed cache duplication and preserve GGUF fallback detection * fix: unify GGUF cache matching and merge picker hints * fix: normalize local GGUF matching across picker and model config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: robust cached-gguf classification + hint-aware click routing - _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback by commit_hash so partial/interrupted downloads don't TypeError out of sum() and wipe the entire cached list. - list_cached_gguf / list_cached_models: narrow per-repo try/except so one malformed repo no longer poisons the whole response. - handleModelClick: route through isKnownGgufRepo instead of the suffix-only isGgufRepo, so non-suffixed GGUF repos still open the variant expander from every call site. - Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known GGUF ids to stop conflating "no hint" with "known not-GGUF". - Make HfModelResult.isGguf required (it is always set in makeMapModel). - Add regression tests for the None size case, mixed-repo inclusion in cached-gguf, and per-repo error isolation. * fix: exclude mmproj from GGUF classification and case-normalize hint lookups - _repo_gguf_size_bytes now filters mmproj vision-adapter files so safetensors+mmproj.gguf repos stay on the cached-models path and non-GGUF rows no longer show zero pickable variants. A vision-capable GGUF repo (main weight + mmproj adapter) still classifies as GGUF and reports the main weight size. - modelGgufIds / resultGgufIds now key on lowercased ids and isKnownGgufRepo lowercases its lookup, so store and HF-search ids that differ only by casing still match the same GGUF hint. - New regression tests: mmproj-only repo excluded from cached-gguf, same repo included in cached-models, vision-capable repo still classified as GGUF with correct size. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-15 15:32:01 +04:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Datta Nimmaturi	826c98f3c0	[moe][gemma4] Target MoE for gemma4 (#4913 ) * Target MoE for gemma4 * refactor attention impl determine * Revert "refactor attention impl determine" This reverts commit 888fca08110a9a74278dc1ebc14d0da043bbd11d. * Remove attention policy changes from gemma4 MoE fix	2026-04-14 16:53:07 -05:00
Daniel Han	5aa8c15246	Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021 ) * Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast llama-server's default behavior when the KV cache fills is to silently drop the oldest non-``n_keep`` tokens and keep generating. The UI has no way to tell the user that earlier turns were evicted -- they just see degraded continuity and a confusing ``5,361 / 4,096`` on the context usage bar. Launch llama-server with ``--no-context-shift`` so it returns a clean error once the request would exceed ``n_ctx``. In the chat adapter, catch the error, identify it as a context-limit error via ``isContextLimitError()``, and surface a dedicated toast that names the exact control to adjust: the ``Context Length`` field in the chat Settings panel. Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage crosses 85%, so users see the "raise Context Length in Settings" suggestion before they hit the hard stop. Tests: * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift`` flag in the static launch-command template, and pins it inside the unconditional ``cmd = [ ... ]`` block so a future refactor can't hide it behind a branch. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Shorten --no-context-shift comment to 1 line * Match backend _friendly_error rewrite in isContextLimitError Codex review on PR caught that ``backend/routes/inference.py::_friendly_error`` rewrites the raw llama-server text "request (X tokens) exceeds the available context size (Y tokens)" into "Message too long: X tokens exceeds the Y-token context window. ..." on the main streaming GGUF path. The heuristic only looked for "context size" / "exceeds the available context" / "context shift", none of which survive the rewrite, so the new "Context limit reached" toast would never fire for the most common case. Add matches for "message too long" and "context window" so both wordings hit. Also addresses Gemini feedback on the launch-flag test: * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of reading ``__file__`` directly; scopes the assertions to the function that actually launches llama-server. * Replace the hardcoded ``" ]"`` indent search with a line-at-a-time scan for a line that is just ``]``, so the test survives reformatting. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:20 -07:00
Daniel Han	5861a7ce15	Studio: split model-load progress label across two rows (#5020 ) * Studio: split model-load progress label across two rows The chat flow and training overlay both compose a progress label like "112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the percent badge in a single flex row. Once the rate + ETA part shows up, the label outgrows the row width and wraps mid-phrase, orphaning the percent ("19 left %") onto a second ragged line. Fix in model-load-status.tsx: split the label on the first " • " into a primary (size) chunk that stays on row 1 with the percent, and a secondary (rate/ETA) chunk that renders on its own muted row below. Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly to one row. The inline-status variant keeps only the primary and surfaces the full label via the tooltip. Also extracts the rate/ETA math out of useTransferStats into a pure ``transfer-stats.ts`` module (appendSample + computeTransferStats) so it can be reasoned about and tested without React. The hook is now a thin wrapper that feeds sample history through the pure functions. Backend: adds two companion test files for load_progress(): * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform matrix (Linux /proc, macOS/Windows absence), VmRSS parsing variants (tab/space/missing/malformed), filesystem edges (HF-cache symlinks, broken symlinks, nonexistent paths, relative paths), shard aggregation (partial multi-shard, two series in same dir, mmproj-* exclusion, single-file), lifecycle races, concurrent sampling (10 threads x 50 iters against real /proc), fraction bounds. * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live integration: real subprocess allocating 100 MB to match VmRSS, real ready phase, real dead-pid degradation, real shard aggregation, repeated polling. Skipped on non-Linux. Both complement the existing test_llama_cpp_load_progress.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Hoist splitProgressLabel out of JSX IIFE (review feedback) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:16 -07:00
Eda Z	5b8dbdc3c2	Fix bitsandbytes ROCm install by using pip instead of uv (#4966 ) * Fix bitsandbytes ROCm install by using pip instead of uv * Also use pip for PyPI fallback path in _install_bnb_rocm The original fix correctly switched the pre-release wheel install from uv to pip, but left the PyPI fallback path on uv. If uv breaks bnb on ROCm, the fallback would hit the same issue. Move pip bootstrap before the branch so both paths use pip consistently. * Harden pip bootstrap: try ensurepip first, warn on failure - Try ensurepip --upgrade before falling back to uv pip install pip. ensurepip works offline and does not need PyPI, making the bootstrap robust when the network or index is unavailable. - If both ensurepip and uv fail, emit a visible warning instead of silently swallowing the error (which previously led to a cryptic "No module named pip" downstream). - Use run_maybe_quiet so --verbose users see bootstrap output. - Update comment to document the actual root cause: uv rejects the wheel because filename version and metadata version disagree. * Add --isolated to pip install calls in _install_bnb_rocm uv pip install ignores pip.conf and PIP_* env vars, but python -m pip reads them. Without --isolated, users with PIP_INDEX_URL pointing to a private mirror that does not carry bitsandbytes would see the PyPI fallback fail where it previously worked under uv. --isolated restores parity with the old uv behavior. * Drop --isolated from PyPI fallback in _install_bnb_rocm --isolated suppresses PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, and pip.conf. This is correct for the pre-release path (hardcoded GitHub URL, no index consulted), but breaks the PyPI fallback for users in corporate or air-gapped environments whose only route to bitsandbytes is a private mirror configured via those mechanisms. Keep --isolated on the direct-URL pre-release install; drop it from the index-dependent fallback. * Drop --isolated from pre-release pip install, fix warning wording --isolated suppresses pip.conf cert/proxy/CA settings in addition to index config. For the direct GitHub URL, index config is irrelevant but cert/proxy settings matter in corporate SSL-inspection environments. Without this fix, users with pip.conf-based CA bundles get a TLS error on the pre-release download and silently fall back to the broken PyPI version -- the exact outcome the PR is trying to prevent. Also fix the fallback warning: "unreachable" is too specific since the pre-release install can fail for reasons other than network reachability. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-14 10:23:40 -07:00
pre-commit-ci[bot]	a0b9d14081	[pre-commit.ci] pre-commit autoupdate (#5004 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.9 → v0.15.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.9...v0.15.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:49:18 -07:00
Daniel Han	bb14ab144a	Studio: live model-load progress + rate/ETA on download and load (#5017 ) * Studio: live model-load progress + rate/ETA on download and load Two UX fixes for the opaque multi-minute wait between clicking Load and being able to chat, visible most clearly on large MoE GGUFs like MiniMax-M2.7 (131 GB of weights on a 97 GB GPU): 1. Model-load phase is now observable. The existing chat flow transitions the toast to "Starting model..." as soon as the download hits 100%, then shows a spinner with no other feedback until llama-server reports healthy. For a 130 GB model that spinner freezes for five-plus minutes while the kernel pages shards into the page cache. A new `GET /api/inference/load-progress` endpoint samples `/proc/<pid>/status VmRSS` on the llama-server subprocess against the sum of shard file sizes on disk, so the UI can render a real bar plus rate / ETA during that window. 2. Rate and ETA on downloads and loads. Both the chat toast and the training-start overlay used to show a static pair of numbers (for example "15.4 of 140.8 GB"). A rolling 15-second window over the existing byte-series now surfaces "85.3 MB/s, 24m 23s left" beside that pair. The estimator is shared between the download and load phases so the numbers don't reset when the phase flips. Also fixes a pre-existing assignment bug uncovered while wiring this up: `load_model` was storing the caller's `gguf_path` kwarg into `self._gguf_path`, which is `None` on the HF-download code path. The resolved on-disk path (`model_path`) is what llama-server actually mmaps; downstream consumers need that. No existing reader used `_gguf_path`, so this is a correctness fix for the new endpoint. - Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model. - Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay. - Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:46:22 -07:00
Roland Tannous	514bb3a20e	studio: pin peft to 0.18.1 to fix export subprocess issues (#5015 ) * studio: pin peft to 0.18.1 to fix export subprocess issues peft 0.19.0 causes export subprocess shutdown failures in Studio. Reverting to 0.18.1 resolves the issue. * studio: move peft pin to extras-no-deps to prevent torch upgrade Installing peft via overrides.txt would resolve its deps and pull in torch>=0.11.0, breaking other pinned packages. Moving the pin to extras-no-deps.txt ensures --no-deps is used during install.	2026-04-14 20:16:30 +04:00
Datta Nimmaturi	4328d0b4f6	Fix num_items_in_batch GA for Gemma4 (#4998 ) * Fix num_items_in_batch GA for Gemma4 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:01:10 -07:00
Daniel Han	7252410ccc	studio: stream export worker output into the export dialog (#4897 ) * studio: stream export worker output into the export dialog The Export Model dialog only showed a spinner on the "Exporting..." button while the worker subprocess was doing the actual heavy lifting. For Merged to 16bit and GGUF / Llama.cpp exports this meant several minutes (or more, for large models) of opaque silence, with no way to tell whether save_pretrained_merged, convert_hf_to_gguf.py, or llama-quantize was making progress. This adds a live terminal-style output panel inside the export dialog, rendered just above the Cancel / Start Export buttons and scrollable with auto-follow-tail. It shows stdout and stderr from both the worker process itself and any child process it spawns (GGUF converter, llama-quantize), coloured by stream. Backend - core/export/worker.py: new _setup_log_capture(resp_queue) installed before LogConfig.setup_logging. It saves the original stdout/stderr fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so every child process inherits the redirected fds), and spins up two daemon reader threads. Each thread reads bytes from a pipe, echoes them back to the original fd (so the server console keeps working), splits on \n and \r, and forwards each line to the resp queue as {"type":"log","stream":"stdout\|stderr","line":...,"ts":...}. PYTHONUNBUFFERED=1 is set so nested Python converters flush immediately. - core/export/orchestrator.py: - Thread-safe ring buffer (collections.deque, maxlen 4000) with a monotonically increasing seq counter. clear_logs(), get_logs_since(cursor), get_current_log_seq(), is_export_active(). - _wait_response handles rtype == "log" by appending to the buffer and continuing the wait loop. Status messages are also surfaced as a "status" stream so users see high level progress alongside raw subprocess output. - load_checkpoint, _run_export, and cleanup_memory now wrap their bodies with the existing self._lock (previously unused), clear the log buffer at the start of each op, and flip _export_active in a try/finally so the SSE endpoint can detect idle. - routes/export.py: - Wrapped every sync orchestrator call (load_checkpoint, cleanup_memory, export_merged_model, export_base_model, export_gguf, export_lora_adapter) in asyncio.to_thread so the FastAPI event loop stays free during long exports. Without this the new SSE endpoint could not be served concurrently with the blocking export POST. - New GET /api/export/logs/stream SSE endpoint. Honors Last-Event-ID and a since query param for reconnect, emits log / heartbeat / complete / error events, uses the id field to carry the log seq so clients can resume cleanly. On first connect without an explicit cursor it starts from the current seq so old lines from a previous run are not replayed. Frontend - features/export/api/export-api.ts: streamExportLogs() helper that authFetches the SSE endpoint and parses id / event / data fields manually (same pattern as streamTrainingProgress in train-api.ts). - features/export/components/export-dialog.tsx: - Local useExportLogs(exporting) hook that opens the SSE stream on exporting transitions to true, accumulates up to 4000 lines in component state, and aborts on cleanup. - New scrollable output panel rendered above DialogFooter, only shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is a fast disk write with nothing to show). Dark terminal styling (bg-black/85, emerald text, rose for stderr, sky for status), max-height 14rem, auto-scrolls to the bottom on new output but stops following if the user scrolls up. A small streaming / idle indicator is shown next to the panel title. - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the output panel is visible so the logs have room to breathe. Verified - Python smoke test (tests/smoke_export_log_capture.py): spawns a real mp.get_context("spawn") process, installs _setup_log_capture, confirms that parent stdout prints, parent stderr prints, AND a child subprocess invoked via subprocess.run (both its stdout and stderr) are all captured in the resp queue. Passes. - Orchestrator log helpers tested in isolation: _append_log, get_logs_since (with and without a cursor), clear_logs not resetting seq so reconnecting clients still progress. Passes. - routes.export imports cleanly in the studio venv and /logs/stream shows up in router.routes. - bun run build: tsc -b plus vite build, no TypeScript errors. No existing export behavior is changed. If the subprocess, the SSE endpoint, or the frontend hook fails, the export itself still runs to completion the same way it did before, with or without logs visible. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * export dialog: trim bootstrap noise, scope logs per screen, show realpath Several follow-ups to the live export log work: 1. Worker bootstrap noise (transformers venv activation, Unsloth banner, "Top GGUF/hub models" lists, vision detection, 2k-step weight load bar) is dropped from the export-dialog stream. A threading.Event gate in worker.py defaults closed and only opens once _handle_export actually starts; until then the reader thread still echoes lines to the saved console fd for debugging but does not push them onto the resp_queue. The orchestrator already spawns a fresh subprocess for every checkpoint load, so the gate is naturally reset between runs. 2. tqdm in non-tty mode defaults to a 10s mininterval, which makes multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5 in the worker env so any tqdm-driven progress emits more often. 3. The dialog's useExportLogs hook now also clears its line buffer when exportMethod or open changes, so re-opening the dialog into a different action's screen no longer shows the previous action's saved output. A useElapsedSeconds tick + "Working Xs" badge in the log header gives users a visible sign that long single-step phases (cache copies, GGUF conversion) are still running when no new lines are arriving. 4. ExportBackend.export_{merged,base,gguf,lora} now return (success, message, output_path); the worker forwards output_path on each export__done response, the orchestrator's _run_export passes it to routes/export.py, which surfaces it via ExportOperationResponse.details.output_path. The dialog's Export Complete screen renders the resolved on-disk realpath under "Saved to" so users can find their exported model directly. fix(cli): unpack 3-tuple return from export backend ExportOrchestrator.export_{merged,base,gguf,lora} now return (success, message, output_path) so the studio dialog can show the on-disk realpath. The CLI still unpacked 2 values, so every `unsloth export --format ...` crashed with ValueError before reporting completion. Update the four call sites and surface output_path via a "Saved to:" echo. * fix(studio): anchor export log SSE cursor at run start The export dialog SSE defaulted its cursor to get_current_log_seq() at connect time, so any line emitted between the POST that kicks off the export and the client opening the stream was buffered with seqs 1..k and then skipped (seq <= cursor). Long-running exports looked silent during their first seconds. Snapshot _log_seq into _run_start_seq inside clear_logs() and expose it via get_run_start_seq(). The SSE default cursor now uses that snapshot, so every line emitted since the current run began is reachable regardless of when the client connects. Old runs still can't leak in because their seqs are <= the snapshot. * fix(studio): reconnect export log SSE on stream drop useExportLogs launched streamExportLogs once per exporting transition and recorded any drop in .catch(). Long GGUF exports behind a proxy with an idle kill-timeout would silently lose the stream for the rest of the run even though the backend already supports Last-Event-ID resume. The "retry: 3000" directive emitted by the backend is only meaningful to native EventSource; this hook uses a manual fetch + ReadableStream parse so it had no effect. Wrap streamExportLogs in a retry loop that tracks lastSeq from ExportLogEvent.id and passes it as since on reconnect. Backoff is exponential with jitter, capped at 5s, reset on successful open. The loop stops on explicit backend `complete` event or on effect cleanup. * fix(studio): register a second command so Typer keeps `export` as a subcommand The CLI export unpacking tests wrap `unsloth_cli.commands.export.export` in a fresh Typer app with a single registered command. Typer flattens a single-command app into that command, so the test's `runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading `"export"` token as an unexpected extra positional argument -- every parametrized case failed with: Got unexpected extra argument (.../out) Register a harmless `noop` second command so Typer preserves subcommand routing and the tests actually exercise the 3-tuple unpack path they were written to guard. Before: 4 failed After: 4 passed --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: studio-install <studio@local.install> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 08:55:43 -07:00
Daniel Han	eca592effe	studio: show HF model download progress in training start overlay (#4894 ) * studio: show HF model download progress in training start overlay During the training setup phase, the overlay only displayed a static "Loading model..." line while model weights were being downloaded from Hugging Face. On slow connections this looked like the app had frozen. This adds a small self-contained progress block inside the existing TrainingStartOverlay that polls the existing GET /api/models/download-progress endpoint and renders a Progress bar with bytes downloaded, total bytes, and percent complete. Notes: - Frontend only change. No backend, worker, SSE, or runtime store edits. - Reuses the existing getDownloadProgress client wrapper and the existing /api/models/download-progress endpoint that already scans the HF blob cache for completed and .incomplete files. - selectedModel is read directly from useTrainingConfigStore inside the overlay, so no prop drilling and live-training-view.tsx is unchanged. - Polling runs at 1500 ms and is gated on the HF repo regex (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses, so local paths and empty form state never hit the endpoint. - Polling stops once progress reaches 1.0 so the bar can stay at 100 until the overlay hides on the first training step. - Network errors are silently swallowed, matching the chat side flow (the bar simply freezes at the last value). - When downloadedBytes is 0 the block is hidden entirely, so cached models do not flash a progress bar. - When the HF API cannot determine the total size, the block falls back to "X downloaded" with no percent and no bar. Verified with bun run build (tsc -b plus vite build, no TypeScript errors). * training overlay: track dataset download + show on-disk realpath Adds a dedicated "Downloading dataset..." section to the training-start overlay alongside the existing model-weights one, so an HF dataset that is downloading mid-startup is no longer mislabeled as model weights or hidden entirely. The new GET /api/datasets/download-progress endpoint mirrors /api/models/download-progress against the datasets-- prefix in HF_HUB_CACHE. Both endpoints now also return cache_path, the resolved on-disk realpath of the snapshot directory (or the cache repo root if no snapshot is materialized yet). The overlay surfaces this under each download row so users can immediately see where the model and dataset landed without digging through server logs. The frontend's existing useModelDownloadProgress hook is generalized to a single useHfDownloadProgress(repoId, fetcher) hook that the model and dataset variants both delegate to, keeping polling, gating, and completion semantics in one place. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: Polish training start overlay download progress UI (#4957) * studio: polish training start overlay download progress visuals * Fix formatCachePath cross-platform support and redundant sizeLabel - Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~ - Suppress sizeLabel when no byte info is available (cachePath-only state), since the "Preparing" badge already conveys the status * Fix misleading status badge when download total is unknown - Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot determine if the download is still in progress or already complete (happens when HF size metadata lookup fails for gated/private repos) - Keep "Preparing" badge for the zero-bytes cachePath-only state - Add Windows native path shortening to formatCachePath (C:\Users\<name>) --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: studio-install <studio@local.install> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-14 08:54:01 -07:00
Daniel Han	44082cf88e	Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014 ) * Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM The chat settings sheet's ctx slider reads `max_context_length` from `/api/inference/status` and renders Exceeds estimated VRAM capacity (N tokens). The model may use system RAM. when the user drags the slider above that value. For models whose weights fit on some GPU subset, `_max_context_length` was already set to the binary-search cap and the warning fired correctly. For models whose weights exceed 90% of every GPU subset's free memory (e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe loop never matched a subset, so `max_available_ctx` stayed at the native context (e.g. 196608). The slider ran all the way to native with no indication that any value above the 4096 spec default would trigger `--fit on` and degrade performance. Anchor `max_available_ctx` at `min(4096, native_context_length)` when no subset fits, so the warning fires at the right threshold and the user sees the correct safe-zone / warning-zone split: Before (MiniMax-M2.7 on 97 GB GPU): slider 0 .. 196608, warning threshold = 196608 (never fires) After: slider 0 .. 196608, warning threshold = 4096 (fires correctly) No frontend changes required: `chat-settings-sheet.tsx` already consumes `ggufMaxContextLength` (= status.max_context_length) as the warning threshold and `ggufNativeContextLength` as the slider max. Adds tests/test_llama_cpp_max_context_threshold.py covering weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096 fallback case (don't lie about supported ctx), fittable-model regressions (small / multi-GPU / tiny on huge GPU), and the `max_context_length` property's fallback semantics. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:49 -07:00
Daniel Han	b2f80f210e	Studio: make GGUF disk-space preflight cache-aware (#5012 ) * Studio: make GGUF disk-space preflight cache-aware The pre-download disk check in LlamaCppBackend.load_model compared the repo's total GGUF size against free disk without crediting bytes already present in the Hugging Face cache. Re-loading a large cached model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with "Not enough disk space to download any variant" whenever free disk was below the full weight footprint, even though nothing actually needed to be downloaded. Subtract bytes already on disk via try_to_load_from_cache before comparing against free space. A partial blob (interrupted download) is not credited, so a second attempt still allocates room to finish the download. The log line now also surfaces how much is already cached. Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk, cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse tempfiles keep the GB-scale scenarios cheap to simulate. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:37 -07:00
Daniel Han	767fa8cade	Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011 ) * Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM The load-time auto-fit in LlamaCppBackend.load_model had two issues for models whose weights do not fit on any GPU subset (the common case for large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.): 1. Auto mode (max_seq_length=0) left effective_ctx at the model's native context when no subset passed the 90% fit check. The UI slider then landed on e.g. 196608 for MiniMax-M2.7, far above anything usable. Default the auto-pick to 4096 so the UI starts at a sane value; the slider ceiling stays at the native context so the user can still opt in to longer contexts and receive the "might be slower" warning. 2. Explicit ctx was silently shrunk when weights fit but the requested KV overflowed the 90% budget. The shrink loop emitted -c <capped> -ngl -1 without informing the caller, so a user who had opted into a longer context via the UI never actually got it. Drop the shrink loop on the explicit path and emit -c <user_ctx> --fit on instead, letting llama-server flex -ngl (CPU layer offload). Adds tests/test_llama_cpp_context_fit.py covering both paths, the file-size-only fallback when KV metadata is missing, non-regression on fittable auto-pick, and platform-agnostic input shape. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:25 -07:00
TF-MTGE	a31c82a640	fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922 ) * fix: increase wait response timeout to 900 sec instead of 300 sec. #4845 * Apply suggestion from @gemini-code-assist[bot] good catch Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-14 08:53:14 -07:00
Datta Nimmaturi	da78c6be71	[Studio] Install flash attn at setup time for linux (#4979 ) * [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 16:40:17 +04:00
Datta Nimmaturi	dccc0ebada	[Studio] Show non exported models in chat UI (#4892 ) * Show non exported models in chat UI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Distinguish b/w LoRa and full fine tune saves. Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-14 15:03:58 +04:00
Bharath Kumar Adinarayan	a50f61009b	fix(studio): default chart view to full training history (#5007 ) * fix(studio): default chart view to full training history instead of last 80 steps Fixes #5003 * chore: windowsize as null code comment --------- Co-authored-by: imagineer99 <samleejackson0@gmail.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>	2026-04-14 03:29:27 -07:00
Lee Jackson	bfa17330bd	Studio: Polish API key copy button and harden async clipboard fallback (#5006 ) * fix: polish clipboard style and fix async clipboard path * Use copyToClipboardAsync in CopyButton for Safari fallback CopyButton was calling navigator.clipboard.writeText directly, bypassing the execCommand fallback added in this same PR. Switch to copyToClipboardAsync which tries execCommand first (Safari user-gesture requirement) then falls back to the async clipboard API. * Fix copyToClipboard sync contract regression and improve async path - Restore copyToClipboard() to return only the execCommand result, preserving the boolean contract that 7 existing callers depend on to gate their "Copied!" UI state. The fire-and-forget async fallback was returning true before the promise resolved, causing false success. - Add document.body null guard to copyWithExecCommand for SSR safety. - Reorder copyToClipboardAsync to try the async Clipboard API first, avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs where execCommand always fails anyway. * Restore queryCommandSupported guard and fix async catch path - Restore the queryCommandSupported("copy") guard in copyToClipboard() to match the original contract exactly: when execCommand is entirely unsupported, fall through to fire-and-forget async clipboard write. - Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText rejects, the user-gesture frame is gone, so execCommand will also fail. Return false from catch instead of falling through. The execCommand fallback at the bottom only runs when the Clipboard API is absent (still in user-gesture frame). * Restore execCommand fallback in copyToClipboardAsync catch path The catch block was returning false after clipboard API rejection, based on the incorrect premise that the user-gesture frame is lost after an await. Per the HTML spec, transient user activation IS preserved through promise microtask chains. The real reason execCommand fails in the Radix dialog is the focus trap intercepting textarea.focus(), not gesture loss. For non-dialog callers, execCommand can still succeed after a clipboard rejection. Inside a Radix modal, execCommand returns false harmlessly (focus trap blocks it). * Harden textarea fallback for mobile and continue to async path on failure --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 14:22:14 +04:00
Wasim Yousef Said	97eafd999e	studio: fix api-keys access + refresh (#5005 ) * studio: fix api-keys access + refresh * studio: guard v1 in spa fallback	2026-04-13 23:48:51 +04:00
AdamPlatin123	d2fc582840	studio: skip training status/metrics polling when idle (#4988 ) * fix(studio): skip training status/metrics polling when idle Add an early return in the status and metrics setInterval callbacks when the runtime store reports phase === "idle" and hasHydrated is true. Previously these polls fired unconditionally every 3s/5s, generating unnecessary network traffic and console errors when no training was running. * fix(studio): reduce idle polling to 30s instead of stopping entirely Review feedback (PR #4988): completely stopping polling when idle risks permanent UI desync if hydration fails, and misses out-of-band state changes from other clients. Add a 30s background poll that only fires when idle to recover gracefully. * fix: harden idle status polling around hydration and runtime reset --------- Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: imagineer99 <samleejackson0@gmail.com>	2026-04-13 12:02:12 -07:00
Daniel Han	9a261aec5f	Studio: Expose openai and anthropic compatible external API end points (#4956 ) * Studio: add API key authentication for programmatic access External users want to hit the Studio API (chat completions with tool calling, training, export, etc.) without going through the browser login flow. This adds sk-unsloth- prefixed API keys that work as a drop-in replacement for JWTs in the Authorization: Bearer header. Backend: - New api_keys table in SQLite (storage.py) - create/list/revoke/validate functions with SHA-256 hashed storage - API key detection in _get_current_subject before the JWT path - POST/GET/DELETE /api/auth/api-keys endpoints on the auth router Frontend: - /api-keys page with create form, one-time key reveal, keys table - API Keys link in desktop and mobile navbar - Route registered with requireAuth guard Zero changes to any existing route handler -- every endpoint that uses Depends(get_current_subject) automatically works with API keys. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use actual origin in API key usage examples The examples on /api-keys were hardcoded to localhost:8888 which is wrong for remote users. Use window.location.origin so the examples show the correct URL regardless of where the user is connecting from. * Add `unsloth studio run` CLI command for one-liner model serving Adds a `run` subcommand that starts Studio, loads a model, creates an API key, and prints a ready-to-use curl command -- similar to `ollama run` or `vllm serve`. Usage: unsloth studio run -m unsloth/Qwen3-1.7B-GGUF --gguf-variant UD-Q4_K_XL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add end-to-end tests for `unsloth studio run` and API key usage Tests the 4 usage examples from the API Keys page: 1. curl basic (non-streaming) chat completions 2. curl streaming (SSE) chat completions 3. OpenAI Python SDK streaming completions 4. curl with tools (web_search + python) Also tests --help output, invalid key rejection, and no-key rejection. All 7 tests pass against Qwen3-1.7B-GGUF. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add /v1/completions, /v1/embeddings, /v1/responses endpoints and --parallel support - llama_cpp.py: accept n_parallel param, pass to llama-server --parallel - run.py: plumb llama_parallel_slots through to app.state - inference.py: add /completions and /embeddings as transparent proxies to llama-server, add /responses as application-level endpoint that converts to ChatCompletionRequest; thread n_parallel through load_model - studio.py: set llama_parallel_slots=4 for `unsloth studio run` path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make /v1/responses endpoint match OpenAI Responses API format The existing /v1/responses shim returned Chat Completions format, which broke OpenAI SDK clients using openai.responses.create(). This commit replaces the endpoint with a proper implementation that: - Returns `output` array with `output_text` content parts instead of `choices` with `message` - Uses `input_tokens`/`output_tokens` instead of `prompt_tokens`/ `completion_tokens` in usage - Sets `object: "response"` and `id: "resp_..."` - Emits named SSE events for streaming (response.created, response.output_text.delta, response.completed, etc.) - Accepts all OpenAI Responses API fields (tools, store, metadata, previous_response_id) without erroring -- silently ignored - Maps `developer` role to `system` and `input_text`/`input_image` content parts to the internal Chat format Adds Pydantic schemas for request/response models and 23 unit tests covering schema validation, input normalisation, and response format. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: add Anthropic-compatible /v1/messages endpoint (#4981) * Add Anthropic-compatible /v1/messages endpoint with tool support Translate Anthropic Messages API format to/from internal OpenAI format and reuse the existing server-side agentic tool loop. Supports streaming SSE (message_start, content_block_delta, etc.) and non-streaming JSON. Includes offline unit tests and e2e tests in test_studio_run.py. * Add enable_tools, enabled_tools, session_id to /v1/messages endpoint Support the same shorthand as /v1/chat/completions: enable_tools=true with an optional enabled_tools list uses built-in server tools without requiring full Anthropic tool definitions. session_id is passed through for sandbox isolation. max_tokens is now optional. * Strip leaked tool-call XML from Anthropic endpoint content Apply _TOOL_XML_RE to content events in both streaming and non-streaming tool paths, matching the OpenAI endpoint behavior. * Emit custom tool_result SSE event in Anthropic stream Adds a non-standard tool_result event between the tool_use block close and the next text block, so clients can see server-side tool execution results. Anthropic SDKs ignore unknown event types. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Split /v1/messages into server-side and client-side tool paths enable_tools=true runs the existing server-side agentic loop with built-in tools (web_search/python/terminal). A bare tools=[...] field now triggers a client-side pass-through: client-provided tools are forwarded to llama-server and any tool_use output is returned to the caller with stop_reason=tool_use for client execution. This fixes Claude Code (and any Anthropic SDK client) which sends tools=[...] expecting client-side execution but was previously routed through execute_tool() and failing with 'Unknown tool'. Adds AnthropicPassthroughEmitter to convert llama-server OpenAI SSE chunks into Anthropic SSE events, plus unit tests covering text blocks, tool_use blocks, mixed, stop reasons, and usage. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix httpcore GeneratorExit in /v1/messages passthrough stream Explicitly aclose aiter_lines() before the surrounding async with blocks unwind, mirroring the prior fix in external_provider.py (`a41160d3`) and cc757b78's RuntimeError suppression. * Wire stop_sequences through /v1/messages; warn on tool_choice Plumb payload.stop_sequences to all three code paths (server-side tool loop, no-tool plain, client-side passthrough) so Anthropic SDK clients setting stop_sequences get the behavior they expect. The llama_cpp backend already accepted `stop` on both generate_chat_ completion and generate_chat_completion_with_tools; the Anthropic handler simply wasn't passing it. tool_choice remains declared on the request model for Anthropic SDK compatibility (the SDK often sets it by default) but is not yet honored. Log a structured warning on each request carrying a non- null tool_choice so the silent drop is visible to operators. * Wire min_p / repetition_penalty / presence_penalty through /v1/messages Align the Anthropic endpoint's sampling surface with /v1/chat/completions. Adds the three fields as x-unsloth extensions on AnthropicMessagesRequest and threads them through all three code paths: server-side tool loop, no-tool plain, and client-side passthrough. The passthrough builder emits "repeat_penalty" (not "repetition_penalty") because that is llama-server's field name; the backend methods already apply the same rename internally. * Fix block ordering and prev_text reset in non-streaming tool path _anthropic_tool_non_streaming was building the response by appending all tool_use blocks first, then a single concatenated text block at the end — losing generation order and merging pre-tool and post-tool text into one block. It also never reset prev_text between synthesis turns, so the first N characters of each post-tool turn were dropped (where N = length of the prior turn's final cumulative text). Rewrite to build content_blocks incrementally in generation order, matching the streaming emitter's behavior: deltas within a turn are merged into the trailing text block, tool_use blocks interrupt the text sequence, and prev_text is reset on tool_end so turn N+1 diffs against an empty baseline. Caught by gemini-code-assist[bot] review on #4981. * Make test_studio_run.py e2e tests pytest-compatible Add a hybrid session-scoped studio_server fixture in conftest.py that feeds base_url / api_key into the existing e2e test functions. Three invocation modes are now supported: 1. Script mode (unchanged) — python tests/test_studio_run.py 2. Pytest + external server — point at a running instance via UNSLOTH_E2E_BASE_URL / UNSLOTH_E2E_API_KEY env vars, no per-run GGUF load cost 3. Pytest + fixture-managed server — pytest drives _start_server / _kill_server itself via --unsloth-model / --unsloth-gguf-variant, CI-friendly The existing _start_server / _kill_server helpers and main() stay untouched so the script entry point keeps working exactly as before. Test function signatures are unchanged — the (base_url, api_key) parameters now resolve via the new fixtures when running under pytest. * Rename test_studio_run.py -> test_studio_api.py The file is entirely about HTTP API endpoint testing (OpenAI-compatible /v1/chat/completions, Anthropic-compatible /v1/messages, API key auth, plus a CLI --help sanity check on the command that runs the API). None of its tests cover training, export, chat-UI, or internal-Python-API concerns. The old name misleadingly suggested "tests for the unsloth studio run CLI subcommand" — the new name reflects the actual scope. Updates: - git mv the file (rename tracked, history preserved) - Rewrite opening docstring to state the API surface focus and call out what is explicitly out of scope - Update all 4 Usage-block path references to the new filename - LOG_FILE renamed to test_studio_api.log - conftest.py fixture import rewritten from test_studio_run to test_studio_api, plus 7 docstring/comment references updated No functional changes to test logic, signatures, or main(). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix httpcore asyncgen cleanup in /v1/messages and /v1/completions The earlier fix in `985e92a9` was incomplete: it closed aiter_lines() explicitly but still used `async with httpx.AsyncClient()` / `async with client.stream()` inside the generator. When the generator is orphaned (e.g. client disconnects mid-stream and Starlette drops the StreamingResponse iterator without explicitly calling aclose()), Python's asyncgen finalizer runs the cleanup in a DIFFERENT task than the one that originally entered the httpx context managers. The `async with` exits then trigger httpcore's HTTP11ConnectionByteStream .aclose(), which enters anyio.CancelScope.__exit__ with a mismatched task and raises RuntimeError("Attempted to exit cancel scope in a different task"). That error escapes any user-owned try/except because it happens during GC finalization. Replace `async with` with manual client/response lifecycle in both /v1/messages passthrough and /v1/completions proxy. Close the response and client in a finally block wrapped in `try: ... except Exception: pass`. This suppresses RuntimeError (and other Exception subclasses) from the anyio cleanup noise while letting GeneratorExit (a BaseException, not Exception) propagate cleanly so the generator terminates as Python expects. Traceback observed in user report: File ".../httpcore/_async/connection_pool.py", line 404, in __aiter__ yield part RuntimeError: async generator ignored GeneratorExit ... File ".../anyio/_backends/_asyncio.py", line 455, in __exit__ raise RuntimeError( RuntimeError: Attempted to exit cancel scope in a different task * Expand unsloth studio run banner with SDK base URL and more curl examples Add an explicit "OpenAI / Anthropic SDK base URL" line inside the info box so SDK users don't accidentally copy the bare server URL (without /v1) into their OpenAI/Anthropic SDK constructors and hit 404s. Replace the single /v1/chat/completions curl example with three labeled blocks: chat/completions, Anthropic /messages, and OpenAI Responses. The Anthropic example includes max_tokens (Anthropic SDKs require it even though Studio accepts None). All examples derived from a computed sdk_base_url so the /v1 prefix stays in sync if the public path ever changes. * Hash API keys with HMAC-SHA256 + persistent server secret Stores the HMAC secret in a new app_secrets singleton table. Fixes CodeQL py/weak-sensitive-data-hashing alert on storage.py:74-76, 394-395. Refresh tokens stay on plain SHA-256 (unchanged _hash_token) so existing user sessions survive upgrade — API keys are new on this branch so there is no migration. * Use PBKDF2 for API key hashing per CodeQL recommendation HMAC-SHA256 was still flagged by py/weak-sensitive-data-hashing. Switch to hashlib.pbkdf2_hmac, which is in CodeQL's recommended allowlist (Argon2/scrypt/bcrypt/PBKDF2). Persistent server-side salt stays in app_secrets for defense-in-depth. 100k iterations to match auth/hashing.py's password hasher. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-13 21:08:11 +04:00
Roland Tannous	3bb72a557f	Pin kernels==0.12.1 to avoid huggingface_hub dataclass conflict (#5000 )	2026-04-13 20:42:02 +04:00
Lee Jackson	21a7895959	Studio: Prompt manager, message deletion, and chat UI improvements (#4938 ) * feat(chat): code block styling, delete with Dexie sync, settings sheet polish * style: config save/delete padding fix * fix(studio): centralize dark code-block surface and optimize message sync writes * style: config padding/alignment polish * fix(studio): upsert custom presets without implicit rename-delete * fix settings sheet save state polish * fix settings sheet button widths * fix chat settings presets * fix chat delete sync * fix chat trust remote code flow --------- Co-authored-by: shine1i <wasimysdev@gmail.com>	2026-04-13 16:42:33 +02:00
AdamPlatin123	3b092bcd46	fix(studio): prevent route transition DOM duplication via AnimatePresence (#4987 ) Add mode="wait" and exit={{ opacity: 0 }} to the root AnimatePresence wrapper so outgoing routes fully unmount before incoming routes render. Without this, rapid navigation between Studio/Export/Recipes/Chat caused pages to stack (2x–3x duplication). Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>	2026-04-13 01:38:00 -07:00
Manan Shah	80c12ff1a6	Move gemma4 script (#4994 ) * updating gemma4 script * moving gemma4 script to scripts folder	2026-04-12 23:41:15 -07:00
Manan Shah	db3b3a4d9b	updating gemma4 script (#4992 ) * updating gemma4 script * show errors	2026-04-12 23:11:32 -07:00
Daniel Han	93a24f6698	Add ROCm test suite for PR #4720 (#4824 ) 95 Python tests and 23 shell tests covering ROCm detection, torch index URL selection, hardware flags, prebuilt asset selection, and install pathway logic. All tests use mocks -- no AMD hardware required. Companion to #4720 (AMD ROCm/HIP support).	2026-04-11 04:44:13 -07:00
Daniel Han	53af4a1b3e	Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ (#4934 ) * Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ Two compounding bugs caused Gemma-4 GRPO training to diverge with KL ~10^12 at step 1 against TRL 1.0.0+. Both fixes are runtime patches in the existing TRL/model patch flow and are no-ops for models and TRL versions that are not affected. Fix 1 (rl.py): replace trl.models.utils.disable_gradient_checkpointing with a no-op context manager. TRL 1.0.0+ wraps generation in `with torch.no_grad(), disable_gradient_checkpointing(self.model, ...):` purely to suppress a cosmetic PyTorch warning ("None of the inputs have requires_grad=True"). Inside torch.no_grad() the gradient checkpointing state has no functional effect on the forward pass. On context exit, TRL calls model.gradient_checkpointing_enable() which dispatches to HF's generic implementation and overwrites Unsloth's custom `use_gradient_checkpointing="unsloth"` wrapper, corrupting Gemma-4 forward numerics. Replacing the toggle with a no-op preserves Unsloth's custom GC wrapper across generation passes. The patch walks sys.modules dynamically to also rebind the symbol on every trl.* module that already imported it (grpo_trainer, dpo_trainer, rloo_trainer, dppo_trainer, gfpo_trainer, grpo_with_replay_buffer_trainer, and any future trainer module). Fix 2 (vision.py): inject `final_logit_softcapping` from `config.text_config` into the top-level `model.config` for multimodal models. Unsloth's GRPO trainer reads `getattr(model.config, "final_logit_softcapping", 0)` but for Gemma-4 the attribute lives only on the nested `Gemma4TextConfig`, so the lookup silently defaults to 0 instead of 30. Backwards compatibility: - trl 0.22.2: no `disable_gradient_checkpointing` symbol exists, the patch early-returns via `hasattr` guard. - trl 0.27.1: same broken pattern as 1.0.0, the noop replacement is correct. - trl 1.0.0+: end-to-end verified on `unsloth/gemma-4-E2B-it` GRPO with TRL 1.0.0 and transformers 5.5.0. Step 1 loss=2.46e-08, kl=2.92e-05 (machine zero) vs broken baseline loss=1.37e+06, kl=1.76e+09. - Llama / non-VLM text models: Fix 2 is a no-op (no `text_config`); Fix 1 is functionally identical (Unsloth's GC wrapper is preserved). - Qwen3-VL and other VLMs without final_logit_softcapping: Fix 2 is a no-op (text_config.final_logit_softcapping is None). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Apply loop 1 review fixes for PR #4934 - Move Fix 2 from vision.py to rl_replacements.py:858 and :1110 at the actual consumer sites. This avoids mutating model.config (which could leak into save_pretrained output) and covers text-only Gemma-4 paths that do not flow through FastBaseModel.from_pretrained. - Revert the vision.py injection block entirely. - Narrow the bare except blocks in patch_trl_disable_gradient_checkpointing from `except Exception:` to `(AttributeError, ImportError)` and `(AttributeError, TypeError)` to avoid masking unrelated bugs. - Add logger.warning_once when the noop patch is installed, matching patch_trl_openenv and patch_trl_vllm_generation convention. - Remove the dead per-module `_unsloth_noop_patched` sentinel check inside the sys.modules walk. The function-level early return already covers this case. - Move `import sys` and `from contextlib import contextmanager` to the module-level imports instead of inside the function body. - Rewrite the ordering comment in PatchFastRL to accurately describe why patch_trl_disable_gradient_checkpointing must run before patch_trl_rl_trainers. - Fix keyword default spacing to match surrounding rl.py style. End-to-end verified: Gemma-4-E2B GRPO on TRL 1.0.0 + transformers 5.5.0 step 1 loss=2.464e-08 kl=2.921e-05, all 5 steps succeed. * Apply loop 2 review fix for PR #4934 Extract the final_logit_softcapping fallback logic into a shared helper `_unsloth_get_final_logit_softcapping(config)` defined in rl_replacements.py and injected into the compiled cache via RL_PRE_ITEMS["grpo_trainer"]. Both call sites (`grpo_trainer__generate_and_score_completions` and `grpo_trainer_compute_loss`) now use the helper instead of inlining the same text_config fallback block twice. Verified: compiled cache file lists the helper at module scope and both consumer sites call it. Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05 (unchanged), all 5 steps pass. * Apply loop 3 review fix for PR #4934 Extend _unsloth_get_final_logit_softcapping to also fall back to config.get_text_config() for composite configs such as T5GemmaConfig where the text sub-config is not exposed via the text_config attribute but only via the get_text_config() method. Guard against (TypeError, ValueError) raised by ambiguous composite configs, and skip the self-referential case where get_text_config() returns self. This addresses the 6/7 reviewer consensus from the third review loop. Verified: - Helper returns 30.0 for Gemma-4, T5Gemma, and Gemma 1/2 configs. - Helper returns 0 for Llama, Qwen, Mistral, Cohere, Granite, and ambiguous configs raising ValueError. - Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05 (unchanged). - Llama-3.2-1B GRPO all 5 steps loss=0 kl=0 (no regression). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-10 07:58:15 -07:00
Daniel Han	65b4028560	Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) (#4954 ) * Pin bitsandbytes to continuous-release_main on ROCm for 4-bit decode fix bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on every ROCm target: - CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a broken blocksize=32/64 warp64 GEMV kernel whose tests were explicitly skipped with ROCM_WARP_SIZE_64 guards because the code was known broken. - RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time BNB_WARP_SIZE macro in the host-side dispatch that resolves to 64 when the multi-arch wheel is compiled with CDNA as the primary target, so num_blocks is wrong on RDNA and half the GEMV output is never written. At decode shape (1, 1, hidden) both bugs produce NaN. Training is unaffected because training shapes are (batch, seq_len > 1, hidden) and never touch the GEMV path. The crash during autoregressive inference surfaces as _assert_async_cuda_kernel in torch.multinomial which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of a clean Python error. Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA", PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a runtime hipDeviceGetAttribute query and ships a working CDNA warp64 kernel. That commit has not shipped to PyPI yet, but continuous-release_main wheels are published on every push to bnb main via GitHub Releases. Point the ROCm install path at the continuous-release_main x86_64 and aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is unreachable (offline installs, firewalled hosts, or architectures not covered by the pre-release wheels). Drop the pin once bnb cuts a 0.50+ tag on PyPI. Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1 (no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit sampling generation works end-to-end. NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is gated on the ROCm torch index and platform.machine() respectively. * Drop Studio ROCm 16-bit fallback now that bnb 0.50+ fixes 4-bit decode The 16-bit fallback in studio/backend/core/inference/inference.py was added as a workaround for a bug that this PR already fixes at the install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel on every ROCm target, which NaNs at decode shape (seq_len=1) and crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in 0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this PR) restores correct 4-bit decode on MI300X and verified working end-to-end with full Unsloth + for_inference + sampling. Revert the dual code path so ROCm and NVIDIA both go through the normal FastLanguageModel.from_pretrained + for_inference flow: - Remove the conditional `from unsloth import` that skipped the import on ROCm. The monkey-patches it was trying to avoid were never the cause of the crash; bnb 4-bit GEMV was. - Remove the `if _hw_module.IS_ROCM:` branch in load_model that loaded with plain transformers + PEFT + bfloat16, and the `_resolve_fp16_base` helper it relied on. - Remove the `get_chat_template is not None` fallback in _load_chat_template_info -- get_chat_template is now always imported. - Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM directly instead of the removed _IS_ROCM_ENV global. Audio and vision on ROCm still need separate validation (FastVisionModel and the CSM audio codecs were never tested on HIP) so the guard stays for now. Add _bnb_rocm_4bit_ok() as a runtime safety net for users who install from this PR before the install.sh bnb pin kicks in, or whose installer fell back to the PyPI pin because the continuous- release wheel was unreachable. When the installed bnb is < 0.50 on ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit / -bnb-4bit suffix from the model path so a pre-quantized repo resolves to its FP16 sibling instead of pulling bnb back in via the repo's quantization_config. LoRA adapters whose base is a pre-quantized repo on old bnb will still fail inside Unsloth's loader -- the only real fix there is `unsloth studio update`. Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): - HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized repo): loads in 4-bit via the fixed GEMV, generation returns "Paris." for greedy and sampling. - SAFETY-NET path (simulated old bnb, suffix-stripped to the FP16 sibling, load_in_4bit=False): loads in bf16, generation returns "Paris." for greedy and sampling. Net diff is ~45 lines smaller than the pre-revert state because the entire plain-transformers 16-bit branch is gone. * Cache _bnb_rocm_4bit_ok() with functools.cache load_model() can be called many times in a single session but the bnb version and hardware state cannot change at runtime, so memoise the check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes` inside the try block), subsequent calls drop to sub-microsecond dict lookups. Zero behavioral change. * Shorten verbose bnb/ROCm comments Comment-only cleanup across install.sh, studio/install_python_stack.py, and studio/backend/core/inference/inference.py. No behavioral change. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove _bnb_rocm_4bit_ok safety net from inference.py Studio's ROCm support is brand new (PR #4720, merged today) and every fresh install pulls the bnb continuous-release_main wheel via install.sh / install_python_stack.py in this same PR. There are no existing ROCm Studio installs carrying bnb < 0.50, so the defensive version-check fallback is guarding against a scenario that cannot actually occur. Delete the helper, the functools import, and the safety-net block -- inference.py now calls FastLanguageModel.from_pretrained directly with no ROCm branching. * Drop audio/vision ROCm guard in inference.py — verified unblocked by bnb fix Vision inference was blocked by the same bnb 4-bit GEMV bug that affected text inference (vision models use bnb 4-bit for the LM backbone). With bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit loaded in 4-bit via FastVisionModel + for_inference returns a correct answer to a multimodal prompt. Audio (CSM) was never actually blocked by HIP — on this hardware CSM loads and runs its backbone forward pass fine with bnb 0.50, then fails during generate() with a transformers-level kwarg validation mismatch in generation_csm.py (`backbone_last_hidden_state` rejected). That's a pre-existing transformers/CSM integration bug that reproduces identically on NVIDIA, so the ROCm-gated guard was never actually protecting users from anything HIP-specific. Remove the combined audio/vision guard and the now-unused _hw_module import. Also restore the one-word "Can be" in an inline comment that drifted during the earlier comment-shortening pass, so the inference.py delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and nothing else. * Shorten max_seq_length=0 guard comment to one line --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-10 06:25:39 -07:00
Daniel Han	cad8c6ad05	Add AMD ROCm/HIP support across installer and hardware detection (#4720 ) * Add ROCm detection to install.sh and expand shell tests Add AMD ROCm GPU detection to get_torch_index_url() in install.sh. When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm version file, hipconfig, dpkg-query, and rpm. Includes validation guard for malformed _rocm_tag, Debian epoch prefix stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install, and status messaging. Shell tests expanded to 23 cases. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm torch reinstall support to install_python_stack.py Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a Linux host has ROCm but the venv received CPU-only torch, and reinstall with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a 30-second timeout on the torch GPU probe subprocess. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm support to llama.cpp prebuilt installer Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP prebuilt with CPU fallback). Add linux-rocm and windows-hip install kinds to runtime_patterns_for_choice(). Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add IS_ROCM hardware flag and fix AMD error message Add IS_ROCM flag to hardware.py detect_hardware() (set when torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM from __init__.py. Add "rocm" key to get_package_versions(). Replace "We do not support AMD" error in tokenizer_utils.py with a helpful message pointing to ROCm installation docs. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add comprehensive ROCm support test suite (68 tests) Add tests/studio/install/test_rocm_support.py covering all ROCm code paths across install_llama_prebuilt.py, install_python_stack.py, hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks and run without AMD hardware. Covers: asset selection (11), runtime patterns (5), HostInfo (4), ROCm version detection (9), torch reinstall (9), index mapping (8), hardware flag (8), tokenizer message (2), install.sh structure (10), and live regression (1). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm support: probe error handling, version cap, validation Address review findings from 8 independent reviewers: - Wrap _ensure_rocm_torch() torch probe in try/except for TimeoutExpired and OSError so a hung or broken torch import does not crash the installer (8/8 reviewers flagged this) - Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to prevent installing unsupported torch 2.11.0 from the rocm7.1 index - Use with-statement for file reads in _detect_rocm_version() to avoid resource leaks - Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of default parameter to avoid relative path resolution) - Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to reject rocm0.x tags that would produce nonexistent PyTorch index URLs - Switch shell version cap from blocklist to allowlist (rocm6.\|rocm7.0 \|rocm7.1* pass through, everything else caps to rocm7.1) so future ROCm 10+ does not fall through to a nonexistent index - Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering - Fix test_probe_timeout_handled: replace zero-assertion test with proper assertions verifying reinstall proceeds after timeout * Clean up rocm_paths list construction in detect_host() Filter None from the ROCM_PATH env var lookup at list construction time instead of relying on the inline `if p` guard in the any() call. * Require actual AMD GPU presence before selecting ROCm paths All 8 reviewers across 2 cycles independently flagged that ROCm detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core) as a proxy for GPU presence, which would misroute CPU-only or NVIDIA hosts that happen to have ROCm tools installed. Now all 3 detection points (install.sh, install_python_stack.py, install_llama_prebuilt.py) probe for an actual AMD GPU before entering the ROCm path: - install.sh: check rocminfo for gfx* GPU names, or amd-smi list for device rows, before version detection - install_python_stack.py: new _has_rocm_gpu() function probes rocminfo and amd-smi list before _ensure_rocm_torch() proceeds - install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi list instead of just checking tool existence or directory paths Also: - Shell test mock amd-smi now handles "list" subcommand - Python tests updated to mock _has_rocm_gpu where needed - Added test_no_gpu_with_rocm_tools_skips to verify the new guard - Test index lookups now use sorted() to match production code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden hipconfig version parsing and torch probe compatibility - Add parts[1].isdigit() check in hipconfig version parsing to handle versions like "6.3-HIP" where the minor component has non-numeric suffix (strip "-" prefix before int() conversion) - Use getattr() in torch probe subprocess to safely handle old or custom torch builds that may lack torch.version.hip/cuda attributes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Strengthen AMD GPU detection and add NVIDIA precedence guard - Change amd-smi list detection from any-non-empty-output to requiring "gpu" marker in output, matching the shell-side NR>1 check. Prevents false positives from header-only amd-smi list output. - Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and install_llama_prebuilt.py behavior). - Apply the same amd-smi marker fix to install_llama_prebuilt.py detect_host() for consistency. * Add Windows-specific ROCm/HIP detection in detect_host() The previous detect_host() ROCm check used rocminfo and amd-smi list which are Linux-only tools. On Windows, has_rocm would always be False, making the Windows HIP prebuilt path at line 1794 unreachable. Now detect_host() uses platform-specific detection: - Linux: rocminfo (check for gfx GPU names) or amd-smi list - Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH This allows Windows AMD users to get the HIP prebuilt binary instead of silently falling through to the CPU prebuilt. * Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion - worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check for hipcc before ROCm source builds, improve status messages and error reporting, add timeout and uv support for the source build fallback - amd.py: New AMD GPU monitoring module via amd-smi metric --json, mirroring nvidia.py structure (utilization, temperature, power, VRAM) - hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization, visible GPU queries, and physical GPU count - install_python_stack.py: Detect AMD GPUs on Windows and warn that ROCm-enabled PyTorch must be installed manually - kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032), RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries - tests: Add 32 new tests covering all changes (95/95 pass) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage - Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi output markers instead of just checking tool existence on PATH - _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools) - amd.py _parse_numeric: handle dict-shaped metric objects from newer amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units - amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs - amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly - install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels exist for older versions); fix rocm7.1* glob to not match rocm7.10+ - is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.) - worker.py: increase ROCm source build timeout from 600s to 1800s; fix success log message for ROCm source builds - Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation - hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with HIP-specific env vars report the correct visible device set - amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly; fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB - install_python_stack.py: Windows AMD warning now validates actual GPU presence via hipinfo/amd-smi output markers before printing - install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP detection after tool-based checks, so Windows HIP installs without CLI tools on PATH are still detected - hardware.py: fix IS_ROCM comment to accurately describe its role * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec Use explicit None checks instead of Python `or` operator when reading HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string ("") is correctly honored as "no visible GPUs" rather than silently falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems. * Fix IS_ROCM test assertion for multi-line formatting * Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count - Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in both install.sh and install_python_stack.py to prevent resolver from selecting incompatible companion packages from ROCm wheel index - Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence without hipinfo/amd-smi is not proof of GPU existence) - Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts * Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428 The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3 (gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201) architectures is based on the original work from PR #4428. Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> * Support AMD Radeon for studio (#4770) Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> * Remove ROCm test files from main PR Move test_rocm_support.py and shell test additions to a separate PR to keep the main ROCm support PR focused on implementation changes. * Fix installer and hardware detection issues for PR #4720 - Fix empty _tri_arg passed to uv pip install in Radeon path (causes "Empty field is not allowed for PEP508" error) - Fix Radeon fallback: use ROCm index instead of CPU-only when repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm) - Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings - Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64 wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag) - Fix IS_ROCM export: use __getattr__ so callers always see the live value after detect_hardware() runs - Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set - Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB) - Add amd-smi version as a fallback in _detect_rocm_version - Fix trailing whitespace and missing newline at EOF in install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GPU detection false positives and add missing health groups - Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows from amd-smi list, not just header containing "gpu" - Apply same fix in detect_host() in install_llama_prebuilt.py - Add runtime_payload_health_groups for linux-rocm and windows-hip so partial/corrupt ROCm/HIP prebuilt installs are properly detected - Add bitsandbytes install to Radeon fallback paths (was only in the success path, skipped when repo.radeon.com was unreachable) - Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main) and only use __getattr__ for IS_ROCM * Fix _ensure_rocm_torch and Windows AMD warning false positives - _ensure_rocm_torch: only skip when HIP is already present, not for CUDA builds (which are unusable on AMD-only hosts). Fixes the case where a venv has a stale CUDA wheel and the repair step is skipped. - Windows AMD warning: use GPU data row check (same as Linux fix) to avoid false positives from amd-smi list header-only output. * Fix amd-smi GPU detection for GPU[N] output format Older amd-smi versions output "GPU[0] : Card series: ..." instead of "GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>" formats to detect actual GPU data rows. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden AMD GPU detection against false positives - install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with strict pattern matching GPU data rows (/^GPU[[:space:]][:\[]/) - All files: reject rocminfo gfx000 (CPU HSA agent) by requiring gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe - Fixes false positives on hosts with ROCm tools but no AMD GPU Remove duplicate comment from pre-commit merge * Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports - Extract _has_amd_rocm_gpu() shell function to avoid duplicating the rocminfo/amd-smi GPU detection logic in get_torch_index_url and the Radeon auto-detect block - Consolidate bitsandbytes install into a single case block after torch install (was duplicated 4 times across Radeon success/fallback paths) - Move math and re imports to top of amd.py (were inline in functions) - Add _smi_query() helper in hardware.py to centralize IS_ROCM backend selection for get_gpu_utilization and get_visible_gpu_utilization Addresses Gemini code review suggestions. * Fix VRAM parsing for string values and GB/GiB consistency - Extract unit from string-valued VRAM fields (e.g. "192 GiB") so _parse_memory_mb correctly applies the unit multiplier instead of treating the value as bare MB - Treat GB and GiB identically (both as binary x1024) since GPU tools including amd-smi use binary units even when labeling them "GB" - Fixes incorrect VRAM reporting on MI300-class cards (was showing ~0.19 GB instead of 192 GB for string-valued outputs) * Add --no-cache to uv for ROCm HIP source builds Avoid stale cache artifacts from partial HIP source builds when uv is used for causal-conv1d/mamba-ssm compilation on ROCm. The pip path already uses --no-cache-dir; this adds the uv equivalent (--no-cache) only when is_hip is True. * Fix critical: initialize _amd_gpu_radeon before case block _amd_gpu_radeon was only set inside the /rocm) case arm, so on NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm", the variable was unbound. With set -u (nounset) enabled, this crashes the installer for every non-AMD user. Move initialization to before the case block so it is always defined. * Fix Windows AMD: route has_rocm hosts to HIP prebuilt path resolve_release_asset_choice was selecting windows-cpu for all Windows x86_64 hosts including those with has_rocm=True. Windows AMD users should fall through to resolve_upstream_asset_choice which tries the HIP prebuilt first. Add "not host.has_rocm" guard to the published windows-cpu selection. * Harden ROCm detection, Radeon wheel fallback, and HIP visibility Addresses review findings from parallel reviewers on PR #4720: - install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L to actually list a GPU before treating the host as NVIDIA. Fixes the stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the CUDA branch. - install.sh: fix hipconfig awk blocks to propagate a non-zero exit code when the output is not a recognisable version string, so the \|\|-chain continues to dpkg-query / rpm instead of terminating early. - install.sh: fail-closed on Radeon wheel fallback. When torch, torchvision or torchaudio is missing from the Radeon repo for the active Python tag, fall back to the standard ROCm index instead of silently mixing Radeon wheels with PyPI defaults. Quote all wheel arguments individually so wheel filenames cannot be word-split or glob-expanded. - install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts with a broken leftover nvidia-smi to the ROCm path instead of misclassifying them as NVIDIA. - install_llama_prebuilt.py: scan upstream assets for any rocm-<version> prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+ users pick up a matching upstream prebuilt when one exists. - install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted on the GPU path instead of passing validation on CPU only. - install_llama_prebuilt.py: restore the published windows-cpu fallback for AMD Windows hosts without a HIP prebuilt so hash-approved bundles are still preferred over the raw upstream CPU asset. - install_python_stack.py: drop the /opt/rocm / hipcc gate in _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm installs (package-managed minimal installs, Radeon software) that ship amd-smi / rocminfo without hipcc can now repair a CPU-only venv via "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard. - studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in get_primary_gpu_utilization(). A process restricted to GPU 2 now reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain bytes unit detection to an explicit allowlist. - studio/backend/utils/hardware/hardware.py: route get_backend_visible_gpu_info()'s backend_cuda_visible_devices field through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the unconditional "(rocm=False)" suffix in apply_gpu_ids() logs. * Fix round 2 regressions: ROCm validate_server and Windows HIP routing Follow-up to `810b833b` addressing review findings on the first round of hardening commits: - install_llama_prebuilt.py validate_server: gate --n-gpu-layers on the resolved install_kind instead of host.has_rocm. AMD Windows hosts without a HIP prebuilt fall back to windows-cpu and must not be validated with GPU layers; thread install_kind through from the caller. - install_llama_prebuilt.py resolve_release_asset_choice: reinstate the "not has_rocm" guard on the published windows-cpu bundle so AMD Windows hosts reach resolve_upstream_asset_choice() where the new HIP prebuilt path lives. Prefer a published windows-hip bundle first when one exists, fall through to upstream HIP + upstream CPU otherwise. - install_llama_prebuilt.py detect_host: also set has_physical_nvidia when the secondary --query-gpu block confirms a working NVIDIA GPU, so older nvidia-smi versions without -L support do not silently skip the Linux diagnostics that key off has_physical_nvidia. - install_llama_prebuilt.py: drop redundant "import re as _re" / "import re as _re_rocm" local aliases in favour of the existing top-level "import re". - install_python_stack.py _ensure_rocm_torch: run the AMD bitsandbytes install unconditionally after the HIP-torch probe so "unsloth studio update" on venvs that already have ROCm torch still gains the AMD bitsandbytes build. - install.sh: add a non-x86_64 early-exit to get_torch_index_url() so aarch64 / arm64 Linux hosts do not hit the ROCm wheel index (PyTorch only publishes ROCm wheels for linux_x86_64). - install.sh: add bitsandbytes install to the migrated-environment branch so upgrades pick it up for ROCm hosts instead of only the fresh-install path. - install.sh: in the Radeon wheel path, pass version constraints + --no-index --find-links to uv instead of explicit wheel URLs so a version-compatible torch / torchvision / torchaudio triple is resolved, rather than picking the highest-version wheel for each package independently. - studio/backend/utils/hardware/amd.py _first_visible_amd_gpu_id: fall through to lower-priority visibility env vars when the first entry is malformed (leading comma, all-whitespace first token) instead of silently returning GPU 0. * Fix round 3 findings: x86_64 guard, ROCm version clip, Radeon deps Address issues surfaced by the round 3 reviewers on top of `8636fa63`: - install_python_stack.py _ensure_rocm_torch: add the same `x86_64` guard that install.sh already has. Linux aarch64 / arm64 ROCm hosts must skip the repair path entirely; PyTorch only publishes ROCm wheels for linux_x86_64, and without this guard `unsloth studio update` aborts with a missing-wheel error on non x86_64 hosts. - install_llama_prebuilt.py resolve_upstream_asset_choice: add a best-effort _detect_host_rocm_version() helper (reading /opt/rocm/.info/version, amd-smi version, hipconfig --version) and filter rocm_candidates to entries whose major.minor is <= host version. Falls back to the newest candidate only when no compatible one exists, so a ROCm 6.4 host downloads rocm-6.4 instead of being handed the numerically newest rocm-7.2 bundle (which fails preflight and forces a source build). - install.sh: remove the round 2 --no-index switch from the Radeon wheel branch. --no-index forced uv to ignore PyPI entirely, which broke transitive dependency resolution (filelock, sympy, networkx, jinja2, fsspec, setuptools, typing-extensions, ...) on a fresh venv. Restore the round 1 explicit wheel URL invocation but add a torch / torchvision / torchaudio version-pair sanity check so a mismatched trio (e.g. torch 2.9.1 + torchvision 0.23.0 + torchaudio 2.9.0) falls back to the standard ROCm index instead of installing a broken combination. - install_python_stack.py _ensure_rocm_torch: restructure the "tag is None" path so it no longer short-circuits the bitsandbytes install. On a ROCm runtime older than anything in _ROCM_TORCH_INDEX, print the "no wheel" warning but still run the AMD bitsandbytes install. - studio/backend/core/training/worker.py: restore the pre-PR "no timeout" behaviour for non-HIP causal-conv1d / mamba-ssm source builds. The round 2 "timeout = 1800 if is_hip else 300" cap aborts slow non-HIP builds (Linux aarch64, unsupported torch/CUDA combos) after 5 minutes; omit timeout for the non-HIP branch so the cap only applies to ROCm source builds. * Fix round 4 findings: apply_gpu_ids env inheritance, Radeon X.Y, bitsandbytes gate Address remaining issues surfaced by the round 4 reviewers: - studio/backend/utils/hardware/hardware.py apply_gpu_ids: mirror the selection into HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES whenever the caller already had a ROCm visibility env var set, not only when IS_ROCM has already been set by detect_hardware(). Training and inference workers call apply_gpu_ids() before detect_hardware() runs, so the old guard would leave a forked ROCm worker with a stale HIP_VISIBLE_DEVICES mask that no longer matched the narrowed CUDA_VISIBLE_DEVICES selection. - install.sh get_radeon_wheel_url: accept X.Y ROCm versions in addition to X.Y.Z. The `/opt/rocm/.info/version` file and some hipconfig versions report only two components, and the Radeon repository publishes both rocm-rel-X.Y.Z/ and rocm-rel-X.Y/ directories, so treating X.Y as invalid caused Radeon hosts to fall back to the generic ROCm index even when a matching AMD wheel set existed. - install_python_stack.py _ensure_rocm_torch: only install the AMD bitsandbytes build when the venv actually has a ROCm-compatible torch (either already present or just installed by this function). Previously the bitsandbytes install ran unconditionally, which could leave an AMD bitsandbytes layered on top of a CPU/CUDA torch on hosts where the ROCm runtime is older than any entry in _ROCM_TORCH_INDEX. Also add --force-reinstall so an existing CPU/CUDA bitsandbytes is replaced by the AMD build during upgrades. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix gemini findings: amd-smi metric envelope validation and dict-wrapped GPU id Two medium-severity defensive fixes from the gemini-code-assist review on the AMD monitoring backend: 1. _extract_gpu_metrics may return a dict where every value is None when amd-smi succeeds (zero exit) but the JSON envelope contains no usable fields (error response, unsupported card). The new _has_real_metrics helper lets get_primary_gpu_utilization surface available:False and lets get_visible_gpu_utilization skip ghost device rows so the UI does not render placeholder cards with empty numbers. 2. Newer amd-smi versions wrap scalar fields as {"value": 0, "unit": "none"}, including the per-GPU id. The previous int(raw_id) call silently fell back to the enumeration index in that case, losing the real GPU id. Routing raw_id through the existing _parse_numeric helper handles bare ints, floats, strings, and the dict shape uniformly, with a debug log on parse failure. * Fix gemini round 2 findings: explicit length guard on ROCm version file parser Both _detect_rocm_version (install_python_stack.py) and _detect_host_rocm_version (install_llama_prebuilt.py) read /opt/rocm/.info/version or $ROCM_PATH/lib/rocm_version, split on "." and unconditionally accessed parts[1]. The surrounding broad `except Exception: pass` already swallowed the resulting IndexError, so a one-component file like "6\n" did fall through to the next detection source -- but the control flow relied on exception handling instead of an explicit check. Add `if len(parts) >= 2:` guards in both helpers so the loop falls through on its own without raising. Behaviour is unchanged for the common multi- component case; the previously-silent IndexError path becomes an explicit no-op. * Fix gemini round 3: include has_rocm in validate_server fallback path When validate_server is called without an explicit install_kind (older call sites that have not been updated), the fallback was only enabling --n-gpu-layers for NVIDIA and macOS arm64 hosts. AMD ROCm Linux hosts fell through to the CPU validation path even though the prebuilt being exercised was a HIP binary. Add host.has_rocm to the fallback expression so the GPU offload flag is applied consistently with the install_kind=='linux-rocm' / 'windows-hip' branches above. * Fix gemini round 4: remove risky bytes-vs-MB heuristic in _parse_memory_mb The previous heuristic divided any bare number above 10_000_000 by 10241024 on the assumption that large unit-less values were bytes. This misclassified small VRAM allocations: 5 MB of used VRAM reported as 5_242_880 bytes without a unit would be taken at face value and render as 5_242_880 MB (~5 TB) in the monitoring UI. Modern amd-smi always provides explicit units (MiB/GiB dict form), and legacy amd-smi returns bare numbers in MB -- the heuristic never had a real workload to handle. Drop it and default to MB for bare numeric input, keeping the existing unit-aware branches for dict / string inputs unchanged. The unrelated gemini suggestion to "default minor to 0" in the amd-smi version awk parser was intentionally NOT applied: rocm7.0 and rocm7.1 ship different wheel sets, so silently substituting 0 for a missing minor could install the wrong wheels. The existing reject-and-fall-through behaviour is safer. Fix gemini round 5: POSIX compliance and leading-comma visibility parsing Three medium findings from gemini-code-assist addressed in this commit: 1. _pick_radeon_wheel used grep -o and sort -V, both GNU extensions that are not in POSIX and break on BSD/BusyBox coreutils. install.sh has a #!/bin/sh shebang so the whole pipeline was rewritten as a single awk script that extracts all href="..." hits on each line, filters to wheels matching the package prefix and python tag, and picks the newest version via zero-padded lexical comparison. No external sort or grep is needed. 2. _first_visible_amd_gpu_id in the AMD monitoring backend treated a leading comma (e.g. HIP_VISIBLE_DEVICES=",1") as "fall through to the next env var", which is surprising given the clear intent to narrow to device 1. Filter empty tokens after the split and return the first real one. An all-commas value ("," / ",,,") still falls through because no real tokens exist; the empty-string and "-1" explicit-zero cases are unchanged. The unrelated amd-smi version awk parser suggestion was not applied (see round 4 commit message for rationale: defaulting a missing minor to 0 could silently install the wrong ROCm wheel set). * Fix 20-reviewer.py findings: base drift, Radeon %2B, dpkg/rpm fallback, bnb, backend label Consolidated fix batch from a 20-parallel reviewer.py run on the current head. Each fix is drawn from a high-consensus finding and addresses a real bug or feature gap, not a stylistic preference. 1. install.sh: bump `unsloth>=2026.4.2` -> `unsloth>=2026.4.4` at five call sites so this branch no longer regresses main's version floor (main bumped to 2026.4.4 in #4876). Without this, merging 4720 would silently downgrade the minimum version pin for fresh installs. 2. install.sh: URL-decode Radeon wheel names before extracting the torch / torchvision / torchaudio version strings. Real wheel URLs from repo.radeon.com are percent-encoded ("torch-2.10.0%2Brocm7.2.0...") so the previous `[+-]` terminator in the sed regex never matched, `_torch_ver` stayed empty, `_radeon_versions_match` stayed false, and every Radeon consumer install silently fell back to the generic ROCm index. Now decode %2B -> + first, then extract, then validate. 3. install.sh: the two AMD bitsandbytes install lines were running `uv pip install "bitsandbytes>=0.49.1"` without `--force-reinstall`, so upgrades where the venv already has a CPU/CUDA bitsandbytes satisfying the constraint would keep the stale non-AMD wheel. Add `--force-reinstall --no-cache-dir` to both call sites, matching the pattern already used in install_python_stack.py::_ensure_rocm_torch. 4. install_python_stack.py and install_llama_prebuilt.py: add `dpkg-query -W rocm-core` and `rpm -q rocm-core` fallbacks to the Python-side ROCm version detectors so they match the chain in install.sh::get_torch_index_url. Package-managed ROCm installs (Debian/Ubuntu/RHEL/Fedora distro packages) can expose GPUs via rocminfo/amd-smi but still lack /opt/rocm/.info/version, hipconfig, or amd-smi `version` output -- without these fallbacks, `unsloth studio update` on such hosts returned None and skipped the ROCm torch repair. Also strip the dpkg epoch prefix ("1:6.3.0-1") before parsing so epoch-annotated packages parse correctly. 5. hardware.py: add a `_backend_label(device)` helper that returns "rocm" when IS_ROCM is set and the device is DeviceType.CUDA, and use it for every `"backend": ...` emission in JSON responses served to the Studio frontend. Internally we still represent ROCm hosts as DeviceType.CUDA (ROCm torch reuses the whole torch.cuda.* API surface), but the user-facing API now correctly reports "rocm" on AMD boxes instead of labeling them as "cuda". All 250 simulation scenarios pass (was 233 before this batch: added 17 new regression tests covering the version pin, %2B decoding, bnb force-reinstall flags, dpkg/rpm fallback presence, and the _backend_label helper's four-way truth table). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix gemini round 6 + URL audit: amd.py defensive checks, rocm6.5+ clip to 6.4 Two rounds of fixes in one commit, plus a full URL audit of every PyPI / download.pytorch.org / repo.radeon.com reference the PR introduces. amd.py (4 medium gemini findings on commit `b3627bc2`): 1. _extract_gpu_metrics used `and vram_total_mb` as part of the vram_util gate. The follow-up `vram_total_mb > 0` already handles the division guard, but the truthiness check was redundant and slightly surprising for a 0.0 valid value. Replace with explicit `is not None and > 0` for both vram_util and power_util. 2. get_physical_gpu_count called `data.get("gpu", ...)` without guarding for non-dict envelopes. A scalar / string JSON response from amd-smi would raise AttributeError. Add an isinstance(data, dict) check and return None for unexpected shapes. 3. get_visible_gpu_utilization had the same .get() exposure on the outer envelope. Rewrite the gpu_list extraction as an explicit list/dict/else cascade so a malformed scalar envelope produces gpu_list=[data] and continues without raising. 4. The same function's per-entry loop also called gpu_data.get() on whatever was inside gpu_list. If a scalar ever leaks into the list (directly or via the previous fix's fallback), _extract_gpu_metrics would raise on the first .get() inside the helper. Skip non-dict entries in the loop before extracting metrics. install.sh (URL audit finding, previously flagged by 20-reviewer as #13): 5. get_torch_index_url used `rocm6.` in the rocm tag case statement, which matched rocm6.5 and rocm6.6 and emitted download.pytorch.org/whl/rocm6.5 -- which returns HTTP 403 because PyTorch only publishes rocm 5.7, 6.0-6.4, 7.0-7.2. Enumerate the supported 6.x minors explicitly and add a rocm6. fallback branch that clips to rocm6.4 (the last supported 6.x wheel set). URL audit results (all URLs PR 4720 references): - 14/14 download.pytorch.org/whl/{cpu,cu118,cu124,cu126,cu128,cu130, rocm6.0..6.4,rocm7.0..7.2} return HTTP 200. - 9/9 repo.radeon.com/rocm/manylinux/rocm-rel-{5.7,6.0,6.1,6.2,6.3, 6.4,7.0,7.1,7.2}/ return HTTP 200. - X.Y.Z patch directories exist for 7.0.2, 7.1.1, 7.2.1 but NOT for 6.3.0, 6.4.0, 6.2.1 -- install.sh already handles this via the X.Y.Z -> X.Y fallback sed in the Radeon wheel install block. - Docs links (rocm.docs.amd.com, docs.unsloth.ai AMD guide) and the llama.cpp GitHub releases API endpoint all return 200. Test suite: 255 -> 258. New regression coverage: - U17: get_physical_gpu_count tolerates scalar amd-smi envelope - U18: get_visible_gpu_utilization tolerates scalar envelope - U19a-c: vram_util / power_util return None on zero total, but vram_total_gb still echoes 0.0 (not None) - A_rocm{6.5,6.6,6.9}_clips_to_rocm64: install.sh clips unsupported 6.x minors to rocm6.4 instead of producing a 403 index URL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix reviewer.py round 2: tokenizer AMD multi-GPU, --no-torch bnb, main.py backend label Three high-confidence findings from a second 20-parallel reviewer.py run on commit `7effb3ae`. Triaged 15 total findings and applied the three that were confirmed as real bugs; the rest were either false positives (e.g. "migrated AMD venv not repaired" -- _ensure_rocm_torch runs downstream via setup.sh regardless), design decisions (e.g. visibility mask env vars not consulted in installer detection), or edge cases the existing fallback logic already handles. 1. unsloth/tokenizer_utils.py [6/20]: the multi-GPU guard's shell probe runs `nvidia-smi --query-gpu=memory.used`, catches the failure, then only raises if `torch.cuda.is_available()` is False. On ROCm torch, torch.cuda.is_available() returns True (ROCm reuses the torch.cuda.* API), so the guard becomes dead code on AMD hosts and multi-GPU AMD setups slip through even though unsloth does not support them yet. Add a torch.cuda.device_count() > 1 fallback inside the except so AMD multi-visible-device setups are flagged consistently with the original CUDA memory check. 2. install.sh [1/20]: the fresh-install bitsandbytes block for AMD ROCm ran unconditionally when TORCH_INDEX_URL matched `/rocm`, even when SKIP_TORCH=true (from --no-torch or Intel Mac auto-detect). A user running `install.sh --no-torch` on an AMD host would still pull in bitsandbytes despite explicitly asking for GGUF-only mode. Wrap the case block in an outer `[ "$SKIP_TORCH" = false ]` guard. 3. studio/backend/main.py [3/20]: the /api/system endpoint returned `"device_backend": get_device().value`, which is "cuda" on ROCm hosts (because ROCm torch piggybacks on torch.cuda). Other endpoints (hardware.py) already use the _backend_label helper which swaps "cuda" -> "rocm" when IS_ROCM. Route /api/system through the same helper so the Studio UI reports the backend consistently across all endpoints. 4. studio/backend/tests/test_utils.py: update test_backend_matches_device to call _backend_label(get_device()) instead of raw get_device().value so the test matches the new contract and still passes on CUDA hosts. Tests: 258 -> 261. New regression coverage: - X08 main.py /api/system uses _backend_label - X09 tokenizer multi-GPU guard has device_count() fallback - X10 fresh-install bnb case block gated on SKIP_TORCH=false * fix: prevent bitsandbytes from overwriting ROCm torch with CUDA wheels During install, bitsandbytes was installed without --no-deps, causing uv to resolve torch from PyPI (CUDA build) and silently overwrite the ROCm wheels that were just installed in the previous step. This happened in three places: - install.sh: bitsandbytes install in both migrated and fresh paths - install_python_stack.py: bitsandbytes install inside _ensure_rocm_torch() Additionally, multiple install steps in install_python_stack.py (extras, overrides, studio deps) can pull in CUDA torch via transitive dependencies. A final _ensure_rocm_torch() call at the end of the install sequence ensures ROCm torch is always in place at runtime. All changes are gated behind ROCm-specific conditions and do not affect NVIDIA, CPU-only, macOS, or Windows install paths. Tested on AMD Instinct MI300X VF with ROCm 7.2.0 -- confirms torch==2.10.0+rocm7.1 with HIP 7.1.25424 after install. * fix: ROCm inference fallback -- skip Unsloth patching and bnb 4-bit on HIP On AMD ROCm (HIP), two issues prevent the normal Unsloth inference path: 1. Unsloth's global monkey-patching of transformers model classes (LlamaRotaryEmbedding, attention modules) triggers _assert_async_cuda_kernel crashes on HIP during generation. Training uses different code paths and works fine. 2. bitsandbytes 4-bit matmul kernels also trigger HIP assertion failures on MI300X (CDNA3 / gfx942), even without Unsloth patching. This commit adds a ROCm-specific inference fallback that: - Skips importing Unsloth at module level (prevents global patching) - Loads models in 16-bit with plain transformers + PEFT instead - Resolves pre-quantized model names (e.g. "xxx-bnb-4bit" -> "xxx") since pre-quantized HF repos still trigger bnb codepaths - Guards get_chat_template calls (unavailable without Unsloth import) - Fixes max_seq_length=0 being passed to from_pretrained (GGUF semantics don't apply to transformers path) The NVIDIA path is completely unchanged -- Unsloth import and for_inference() optimization remain active. GGUF inference (via llama-server/HIP) is unaffected since it never imports Python model classes. AMD GPUs typically have large VRAM (e.g. 192GB on MI300X) so 16-bit loading is practical for inference. Tested on AMD Instinct MI300X VF (ROCm 7.2, HIP 7.1.25424): - Simple generation: PASS - Compare mode (base vs finetuned): PASS - GGUF inference + tool calling: PASS (unaffected by this change) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: guard audio/vision inference on ROCm, remove unused import - Add clear RuntimeError for audio/vision model inference on ROCm (these paths use Unsloth's FastModel/FastVisionModel which would crash on HIP; GGUF inference is the supported path on AMD) - Remove unused `import os as _os` from the ROCm changes * fix: amd-smi parsing for newer output format (gpu_data wrapper, mem_usage, temperature) amd-smi on recent ROCm versions (7.x) wraps metric output in a {"gpu_data": [...]} envelope instead of returning a raw list. This caused get_primary_gpu_utilization() and get_visible_gpu_utilization() to fail silently (returning available=False) because the GPU data dict was never unwrapped. Additionally: - VRAM data moved from "vram" to "mem_usage" with "total_vram" / "used_vram" keys. Added fallback key lookup. - Temperature "edge" sensor returns "N/A" on MI300X VF; the previous dict.get() chain returned the "N/A" string instead of falling through to "hotspot". Changed to a loop that checks each key until a parseable value is found. Tested on AMD Instinct MI300X VF (ROCm 7.2, amd-smi 24.x): - GPU utilization: 0% (idle), up to 100% during training - Temperature: 40-44C (from hotspot sensor) - VRAM: 0.28/191.69 GB (idle) - Power: 158-211W draw * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Bug fix detecting radeon (#4940) * Bug fix detecting radeon * Expanding GPU target for gfx1100* * Generalize gfx family-prefix filter to cover gfx10/gfx12 as well rocminfo on ROCm 6.1+ emits LLVM generic-family ISA lines alongside the specific GPU (e.g. gfx11-generic next to gfx1100). The outer grep captures the bare family prefix from the generic line, and passing that to -DGPU_TARGETS breaks the HIP build because clang only accepts specific gfxNNN ids. The previous filter only special-cased gfx11. Generalize it so any bare 2-digit family prefix (gfx10, gfx11, gfx12, ...) is dropped whenever a specific sibling target is present in the same list. No real AMD GPU has a 2-digit gfx id, so the filter can only ever drop family prefixes and never a real target. Covers the existing gfx11 cases unchanged, and extends the same fix to gfx10-1-generic / gfx10-3-generic (RDNA1/2) and gfx12-generic (RDNA4), which would otherwise hit the same build failure on newer rocminfo. --------- Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> --------- Co-authored-by: Eda Z <eda.zhou@amd.com> Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com> Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-10 01:56:12 -07:00
Roland Tannous	33503ea248	Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945 ) This reverts commit `bcf4fd6bd3`.	2026-04-09 23:14:57 -07:00
Roland Tannous	bcf4fd6bd3	updated models template mappers. added lfm2.5vl450m to transformers 5… (#4939 ) * updated models template mappers. added lfm2.5vl450m to transformers 5.3.0 whitelist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 23:36:42 +04:00
Ricardo-M-L	d5525e8bbb	fix: check find() return value before adding offset in try_fix_tokenizer (#4923 ) * fix: check find() return value before adding offset in try_fix_tokenizer The `str.find()` result was checked for -1 only after adding `len(find_text)`, turning the guard into dead code. When the substring is absent, `start` becomes `len(find_text) - 1` (a positive number), so the `if start == -1: continue` never triggers and the subsequent slice extracts garbage from the tokenizer string. Split the find and offset into two steps so the -1 check works correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add defensive guards for token_id None and end find() returning -1 - Skip loop iteration early when token_id is None to avoid constructing a find_text that can never match valid JSON - Guard end = tokenizer_string.find('",', start) against -1 to prevent silent garbage extraction from malformed tokenizer strings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 06:15:46 -07:00
Lee Jackson	dc16e0c65b	Studio: keep chat input visible and fix compare pane clipping (#4924 ) * fix(chat): sticky composer bar in thread * fix(chat): fix compare pane clipping * fix(chat): tighten scroll-to-bottom placement and compare footer spacing * Fix TypeScript build break and clean up ViewportFooter classes - Remove unused `compact` prop from ThreadScrollToBottom call site (component is FC with no props, passing it caused TS2322) - Extract shared classes (sticky, bottom-0, z-20, bg-transparent) from ternary branches into the unconditional className string - Restore `relative` on normal-mode footer so the inner absolute bg-background strip has a positioning context - Remove redundant md:pb-3 / md:pb-4 (same value as base pb-3 / pb-4) - Remove no-op `sticky bottom-0` from SharedComposer wrapper in both LoraCompareContent and GeneralCompareContent (flex layout with shrink-0 already pins it at the bottom; parent has no scrollable overflow for sticky to bind to) - Fix truncated comment on pointer-events rationale --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 06:00:56 -07:00
kiankyars	ad5972492d	Fix raw text paragraph break normalization (#4884 ) * Fix raw text paragraph break normalization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Normalize horizontal whitespace before stripping non-ASCII and collapse leftover doubles Run the [^\S\n]+ horizontal-whitespace collapse before the non-ASCII strip so that Unicode whitespace (\u00A0, \u202F, \u2009, \u3000, \v, \f, etc.) becomes a single ASCII space instead of being deleted outright. The prior ordering silently merged adjacent words on HTML/PDF/OCR-sourced text: "hello\u00a0world" used to produce "helloworld" after this PR; it now produces "hello world". Also drop \t from the allow-list since the horizontal-whitespace collapse already normalizes tabs to a single space, and add a targeted [ ]{2,} pass right after the non-ASCII strip so that a non-whitespace non-ASCII character sitting between two spaces ("word1 (c) word2") does not leave an interior double space. Without this extra pass, clean_text was not idempotent on such inputs: the first call produced "word1 word2" and only the second call collapsed it to "word1 word2". Fuzz testing over 10000 random inputs now satisfies the idempotence invariant in every case. * Add regression tests for Unicode/control whitespace and non-ASCII edge cases Cover: - Unicode horizontal whitespace separators (NBSP, narrow NBSP, thin space, en/em space, ideographic space, vertical tab, form feed) normalizing to a single ASCII space instead of being deleted. - Mixed paragraph + Unicode whitespace realistic input ("Section\u00a01\r\n\r\nBody\ftext\u202Fhere"). - Tab collapsing and space trimming around newlines. - Non-whitespace non-ASCII characters (copyright, accented letters, emoji) sitting between spaces: must not leave an interior double space, and clean_text must be idempotent on these inputs. - Non-ASCII characters adjacent to a newline: stripping must not leave stray leading or trailing spaces on the neighbouring line, and must not swallow an adjacent paragraph break. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 04:45:43 -07:00
cheehook	7aa442289b	Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) (#4889 ) * Fix Mistral training crash when xformers is unavailable * Fix/adjust Mistral DPO training crash fix for PR #4889 - Clarify comment in MistralForCausalLM_fast_forward: the DPO embed-masking block runs BEFORE attention_mask is nulled out, and it is the consumer that requires a 2D mask. - Add defensive attention_mask.ndim == 2 guard to the LlamaModel_fast_forward DPO embed-masking block so it self-protects if a 4D mask ever reaches it. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 04:38:44 -07:00
Daniel Han	da2ef6dce6	Only run ldconfig CUDA-linking recovery when we have permission (#4930 ) * Only run ldconfig CUDA-linking recovery when we have permission When `import unsloth` runs on a non-root environment (shared HPC, locked-down container, CI runner, etc.) the CUDA-linking recovery path shells out to `os.system("ldconfig /usr/lib64-nvidia")`, which fails loudly with "Permission denied". It's especially noisy for users who don't even have bitsandbytes installed - they're doing 16bit or full finetuning and the line immediately above told them "16bit and full finetuning works!". The reason the recovery runs at all in that case is that `bnb.functional.lib.cdequantize_blockwise_fp32` raises AttributeError on `bnb is None`, the bare `except:` swallows it, and the code drops into the recovery unconditionally. Fix: gate the recovery body on `os.geteuid() == 0`. When we don't have permission to run ldconfig, silently skip the recovery. When we do, the recovery runs UNCHANGED - same `os.system()` calls, same reload + retry, same warnings. `libcuda_dirs()` is used by both triton and bitsandbytes, so we still want to run the recovery whenever we have permission, regardless of whether bnb is installed. For non-root users who DO have bitsandbytes installed and broken, emit a single remediation warning telling them how to fix it manually (`sudo ldconfig /usr/lib64-nvidia`). This preserves the diagnostic guidance from the original code without the Permission denied noise. Scope: - Only the `DEVICE_TYPE == "cuda"` branch is touched. - The `hip` (AMD ROCm) and `xpu` (Intel) branches are unchanged. - On a real CUDA box running as root, behavior is byte-identical to main: same os.system() calls, same reload, same retry, same warnings. AST-verified by /tmp/verify_minimal/verify.py. - `hasattr(os, "geteuid")` guards against Windows where `os.geteuid` doesn't exist. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Daniel Han <info@unsloth.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 00:07:25 -07:00
dependabot[bot]	5fa8683b27	build(deps): bump the bun-frontend group across 1 directory with 16 updates (#4586 ) * build(deps): bump the bun-frontend group across 1 directory with 16 updates Bumps the bun-frontend group with 16 updates in the /studio/frontend directory: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [@dagrejs/dagre](https://github.com/dagrejs/dagre) \| `2.0.4` \| `3.0.0` \| \| [@dagrejs/graphlib](https://github.com/dagrejs/graphlib) \| `3.0.4` \| `4.0.1` \| \| @hugeicons/core-free-icons \| `3.3.0` \| `4.0.0` \| \| [@streamdown/cjk](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-cjk) \| `1.0.2` \| `1.0.3` \| \| [@streamdown/code](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-code) \| `1.0.2` \| `1.1.1` \| \| [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) \| `0.577.0` \| `1.6.0` \| \| [recharts](https://github.com/recharts/recharts) \| `3.7.0` \| `3.8.0` \| \| [shadcn](https://github.com/shadcn-ui/ui/tree/HEAD/packages/shadcn) \| `3.8.5` \| `4.1.0` \| \| [streamdown](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown) \| `2.3.0` \| `2.5.0` \| \| [@biomejs/biome](https://github.com/biomejs/biome/tree/HEAD/packages/@biomejs/biome) \| `1.9.4` \| `2.4.8` \| \| [@eslint/js](https://github.com/eslint/eslint/tree/HEAD/packages/js) \| `9.39.4` \| `10.0.1` \| \| [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) \| `24.12.0` \| `25.5.0` \| \| [eslint](https://github.com/eslint/eslint) \| `9.39.4` \| `10.1.0` \| \| [eslint-plugin-react-refresh](https://github.com/ArnaudBarre/eslint-plugin-react-refresh) \| `0.4.26` \| `0.5.2` \| \| [globals](https://github.com/sindresorhus/globals) \| `16.5.0` \| `17.4.0` \| \| [typescript](https://github.com/microsoft/TypeScript) \| `5.9.3` \| `6.0.2` \| Updates `@dagrejs/dagre` from 2.0.4 to 3.0.0 - [Release notes](https://github.com/dagrejs/dagre/releases) - [Changelog](https://github.com/dagrejs/dagre/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/dagre/compare/v2.0.4...v3.0.0) Updates `@dagrejs/graphlib` from 3.0.4 to 4.0.1 - [Release notes](https://github.com/dagrejs/graphlib/releases) - [Changelog](https://github.com/dagrejs/graphlib/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/graphlib/compare/v3.0.4...v4.0.1) Updates `@hugeicons/core-free-icons` from 3.3.0 to 4.0.0 Updates `@streamdown/cjk` from 1.0.2 to 1.0.3 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-cjk/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/cjk@1.0.3/packages/streamdown-cjk) Updates `@streamdown/code` from 1.0.2 to 1.1.1 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-code/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/code@1.1.1/packages/streamdown-code) Updates `lucide-react` from 0.577.0 to 1.6.0 - [Release notes](https://github.com/lucide-icons/lucide/releases) - [Commits](https://github.com/lucide-icons/lucide/commits/1.6.0/packages/lucide-react) Updates `recharts` from 3.7.0 to 3.8.0 - [Release notes](https://github.com/recharts/recharts/releases) - [Changelog](https://github.com/recharts/recharts/blob/main/CHANGELOG.md) - [Commits](https://github.com/recharts/recharts/compare/v3.7.0...v3.8.0) Updates `shadcn` from 3.8.5 to 4.1.0 - [Release notes](https://github.com/shadcn-ui/ui/releases) - [Changelog](https://github.com/shadcn-ui/ui/blob/main/packages/shadcn/CHANGELOG.md) - [Commits](https://github.com/shadcn-ui/ui/commits/shadcn@4.1.0/packages/shadcn) Updates `streamdown` from 2.3.0 to 2.5.0 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/streamdown@2.5.0/packages/streamdown) Updates `@biomejs/biome` from 1.9.4 to 2.4.8 - [Release notes](https://github.com/biomejs/biome/releases) - [Changelog](https://github.com/biomejs/biome/blob/main/packages/@biomejs/biome/CHANGELOG.md) - [Commits](https://github.com/biomejs/biome/commits/@biomejs/biome@2.4.8/packages/@biomejs/biome) Updates `@eslint/js` from 9.39.4 to 10.0.1 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/commits/v10.0.1/packages/js) Updates `@types/node` from 24.12.0 to 25.5.0 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) Updates `eslint` from 9.39.4 to 10.1.0 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/compare/v9.39.4...v10.1.0) Updates `eslint-plugin-react-refresh` from 0.4.26 to 0.5.2 - [Release notes](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/releases) - [Changelog](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/blob/main/CHANGELOG.md) - [Commits](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/compare/v0.4.26...v0.5.2) Updates `globals` from 16.5.0 to 17.4.0 - [Release notes](https://github.com/sindresorhus/globals/releases) - [Commits](https://github.com/sindresorhus/globals/compare/v16.5.0...v17.4.0) Updates `typescript` from 5.9.3 to 6.0.2 - [Release notes](https://github.com/microsoft/TypeScript/releases) - [Commits](https://github.com/microsoft/TypeScript/compare/v5.9.3...v6.0.2) --- updated-dependencies: - dependency-name: "@dagrejs/dagre" dependency-version: 3.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@dagrejs/graphlib" dependency-version: 4.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@hugeicons/core-free-icons" dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@streamdown/cjk" dependency-version: 1.0.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: bun-frontend - dependency-name: "@streamdown/code" dependency-version: 1.1.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: lucide-react dependency-version: 1.6.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: recharts dependency-version: 3.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: shadcn dependency-version: 4.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: streamdown dependency-version: 2.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: "@biomejs/biome" dependency-version: 2.4.8 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@eslint/js" dependency-version: 10.0.1 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@types/node" dependency-version: 25.5.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint dependency-version: 10.1.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint-plugin-react-refresh dependency-version: 0.5.2 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: globals dependency-version: 17.4.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: typescript dependency-version: 6.0.2 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend ... Signed-off-by: dependabot[bot] <support@github.com> * Revert dagrejs upgrades Keep @dagrejs/dagre at ^2.0.4 and @dagrejs/graphlib at ^3.0.4. * Revert biome, eslint, typescript, and recharts upgrades These upgrades break studio/frontend locally: - @biomejs/biome 2.4.10 fails to parse the existing biome.json (files.ignore and organizeImports keys removed in v2; schema version mismatch). - typescript 6.0.2 emits TS5101 on tsconfig.app.json baseUrl ("Option 'baseUrl' is deprecated and will stop functioning in TypeScript 7.0"), so tsc -b exits 2. - eslint 10.2.0 conflicts with eslint-plugin-react-hooks@7.0.1, which peers on eslint ^9; npm install fails with ERESOLVE. - recharts 3.8.1 widened LegendPayload.dataKey to include a function type, which breaks the React key={item.dataKey} usage in src/components/ui/chart.tsx (TS2322). Hold these at their current pinned versions until the upstream peer deps and config migrations are ready. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-08 04:34:33 -07:00
Wasim Yousef Said	8e977445d4	Let recipes use the model loaded in Chat (#4840 ) * feat: inject local model provider into recipe jobs via JWT * feat: auto-generate JWT for local model providers in recipes * feat: add is_local flag to model provider config types and utils * fix(studio): skip endpoint validation for local providers * feat(studio): add local/external model source toggle to provider dialog * feat(studio): thread localProviderNames through model config dialog chain * feat(studio): show 'Local model (Chat)' label for local model_provider configs * fix: hardcode loopback for local endpoint, clear stale creds on toggle * fix: document TOCTOU/JWT rotation, add deferred import comments, fix is_local serialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): clear stale local model state on provider toggle and validation * fix(studio): override empty local endpoint in validation and skip model gate for unused providers * fix(studio): resolve loopback port from app.state, clear stale local provider fields, sync model id on toggle Address review feedback on the local-model-provider flow: - Backend (jobs.py): _resolve_local_v1_endpoint now reads the actual bound port from app.state.server_port (set in run.py after binding) instead of parsing it out of request.base_url, which is wrong behind any reverse proxy or non-default port. The two duplicated urlparse blocks are gone. - Backend (jobs.py): defensively pop api_key_env, extra_headers, extra_body from local providers so a previously external provider that flipped to local cannot leak invalid JSON or rogue auth headers into the local /v1 call. Also dedupe the post-loop assignment and tighten the local-name intersection so empty names cannot match. - Backend (jobs.py): hoist datetime and urllib.parse imports to the top import block for consistency with the rest of the file. - Backend (run.py): expose the bound port on app.state.server_port after the uvicorn server is constructed. - Frontend (model-provider-dialog.tsx): clear extra_headers and extra_body when toggling to local mode. Hidden inputs would otherwise keep stale JSON blocking validate/run. - Frontend (model-config-dialog.tsx): factor the local-aware provider selection logic into applyProviderChange and call it from both onValueChange and onBlur, so manually typing a provider name and tabbing away keeps the model field consistent. - Frontend (recipe-studio.ts store): handle both directions of the is_local toggle in the cascade. external -> local now backfills model: "local" on already-linked model_configs so they pass validation immediately, mirroring the existing local -> external clear path. - Frontend (validate.ts + build-payload.ts): thread localProviderNames into validateModelConfigProviders and skip the "model is required" check for local-linked configs. Local providers do not need a real model id since the inference endpoint uses the loaded Chat model. * fix(studio): narrow store cascade types, sync model placeholder on graph relink and node removal, harden ephemeral port path Loop 2 review fixes: - recipe-studio.ts: type-narrow next.is_local by also checking next.kind === "model_provider". TS otherwise raised TS2339 because next was typed as the union NodeConfig after the spread. The behavior is unchanged but the code now compiles cleanly. - model-config-dialog.tsx: convert the lastProviderRef / providerInputRef ref-during-render pattern (pre-existing react-hooks/refs lint error) to a useEffect that syncs providerInputRef from config.provider. The combobox blur path still uses applyProviderChange and remains stable. - recipe-graph-connection.ts: when a graph drag links a model_provider to a model_config, mirror the dialog applyProviderChange behavior: fill model: "local" if the new provider is local and the model field is blank, clear model when relinking from a local placeholder to an external provider, otherwise leave the model alone. - reference-sync.ts: when a referenced provider node is removed, clear the synthetic model: "local" placeholder along with the provider field, so a future relink to an external provider does not pass validation with a stale value that fails at runtime. - run.py: only publish app.state.server_port when the bound port is a real positive integer; for ephemeral binds (port==0) leave it unset and let request handlers fall back to request.base_url. - jobs.py: _resolve_local_v1_endpoint also falls back when app.state.server_port is non-positive, and uses `is None` instead of the truthy fallback so a literal 0 is handled correctly. * fix(studio): strict is_local check, narrow loaded-model gate to LLM-reachable configs, add scope-server port fallback Loop 3 review fixes: - jobs.py, validate.py: require `is_local is True` instead of truthy check. Malformed payloads such as is_local: "false" or is_local: 1 would otherwise be treated as local and silently rewritten to the loopback endpoint. - jobs.py: _resolve_local_v1_endpoint now tries request.scope["server"] (the actual uvicorn-assigned (host, port) tuple) as a second resolution step before falling back to parsing request.base_url. This covers direct-uvicorn startup paths and ephemeral binds that never publish app.state.server_port. - jobs.py: new _used_llm_model_aliases helper collects the set of model_aliases that an LLM column actually references, and the "Chat model loaded" gate is now only triggered when a local provider is reachable from that set. Orphan model_config nodes on the canvas no longer block unrelated recipe runs. * fix(studio): force skip_health_check on local-linked configs, skip JSON parsing for local providers, local-aware inline editor Loop 4 review fixes: - jobs.py: after rewriting local providers, also force skip_health_check: true on any model_config linked to a local provider. The /v1/models endpoint only advertises the real loaded model id, so data_designer's default model-availability health check would otherwise fail against the placeholder "local" id before the first chat completion call. The inference route already ignores the model id in chat completions, so skipping the check is safe. - builders-model.ts: buildModelProvider now short-circuits for local providers and emits only { name, endpoint: "", provider_type, is_local } without running parseJsonObject on the hidden extra_headers/extra_body inputs. Imported or hydrated recipes with stale invalid JSON in those fields no longer block client-side validate/run. - inline-model.tsx: the model_config branch now accepts an optional localProviderNames prop and mirrors the dialog applyProviderChange behavior. Changing provider to/from a local one auto-fills or clears the "local" placeholder consistently with the other edit paths. - recipe-graph-node.tsx: derive localProviderNames from the store via useMemo (stable identity) and pass it through renderNodeBody to <InlineModel>. Hooks order is preserved by declaring them above the early return for markdown_note nodes. - run.py: minor comment tweak - loop 3 already added the scope-server fallback path, note that in the comment. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <info@unsloth.ai>	2026-04-08 03:48:22 -07:00
Daniel Han	c3d2d58046	Update dependabot.yml (#4915 )	2026-04-08 03:39:50 -07:00
dependabot[bot]	0087515d5c	build(deps): bump oxc-parser (#4776 ) Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.121.0 to 0.123.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.123.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-08 03:35:40 -07:00
dependabot[bot]	67e9db4921	build(deps): bump oxc-parser (#4776 ) Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.121.0 to 0.123.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.123.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-08 03:35:33 -07:00
pre-commit-ci[bot]	c2184af079	[pre-commit.ci] pre-commit autoupdate (#4879 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.8 → v0.15.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.8...v0.15.9) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-07 22:50:48 -07:00
Roland Tannous	f801e59c29	split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878 ) * split venv_t5 into venv_t5_530 and venv_t5_550 for tiered transformers 5.x support * fix bfloat16 crash on T4 for FORCE_FLOAT32 models and disable trust_remote_code auto-enable for native t5 models * revert FORCE_FLOAT32 dtype change * restrict trust_remote_code auto-enable to Nemotron models only * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * use config.json model_type for tier detection, add unsloth/nvidia namespace guard * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks" This reverts commit `fb43d468e2`. * Revert "use config.json model_type for tier detection, add unsloth/nvidia namespace guard" This reverts commit `fc49ae2453`. * add unsloth/nvidia namespace guard to Nemotron trust_remote_code auto-enable * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * reorder tier checks: all substring matches before config.json fetches * extract shared activate_transformers_for_subprocess into transformers_version.py * narrow Nemotron trust_remote_code to nemotron_h/nemotron-3-nano, add to export worker * clean venv_t5 dirs before re-install in setup.sh, clarify version alias comment * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * run venv_t5 migration outside deps fast-path gate in both setup scripts --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-07 20:05:01 +04:00
Daniel Han	1d8160376e	Bump minimum unsloth version to 2026.4.4 in install scripts (#4876 )	2026-04-06 09:46:35 -07:00

1 2 3 4 5 ...

5035 commits