Commit graph

5035 commits

Author SHA1 Message Date
Daniel Han
1ccfd2e0a5
fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033)
* fix(rocm): tighten gfx regex to ignore generic ISA lines

ROCm 6.1+ rocminfo emits generic ISA names such as
"amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic"
alongside the real GPU name. The previous `gfx[1-9]` regex used in
`_has_rocm_gpu` matched both, so a host with only a generic ISA entry
would be reported as having a usable AMD GPU.

Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids
match. This covers every documented target from GFX6 (gfx600) through
GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 /
MI250X) and gfx90c. Documented generic ISA names always have 1 or 2
digits before the dash and no longer match.

Applied to both `studio/install_python_stack.py` and
`studio/install_llama_prebuilt.py` so the two detection paths agree.

Co-authored-by: Martin Hoyer <mhoyer@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 05:24:41 -07:00
Daniel Han
b7a8ff2833
Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) (#5034)
* Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027)

FastLanguageModel.from_pretrained(..., num_labels=N) crashed with
"NotImplementedError: normal_kernel_cuda not implemented for 'Byte'" on
pre-quantized bnb 4-bit checkpoints (e.g. unsloth/Qwen3-4B-bnb-4bit)
when running on transformers 5.x.

Two pieces were needed to close this out:

1. unsloth_zoo PR: add "score", "classifier", "qa_outputs" to
   SKIP_QUANTIZATION_MODULES so replace_with_bnb_linear leaves task
   heads in the compute dtype.

2. This commit: for pre-quantized checkpoints, transformers reads
   llm_int8_skip_modules from the quantization_config baked into
   config.json and ignores the runtime BitsAndBytesConfig we pass via
   kwargs. Unsloth must merge its skip list into
   model_config.quantization_config.llm_int8_skip_modules before the
   from_pretrained call, or the checkpoint's frozen list
   (e.g. ["lm_head", "multi_modal_projector", "merger",
   "modality_projection"]) wins and the `score` head gets converted to
   Linear4bit with uint8 storage, then _init_weights calls normal_ on
   uint8 and crashes.

Also add a defensive post-load cast on the task head to guard against
any residual path that ends up with a non-floating head dtype.

Verified on transformers 4.57.6 and 5.5.0 with:
- unsloth/Qwen3-4B-bnb-4bit + num_labels=3
- unsloth/Qwen3-4B (non-bnb repo, load_in_4bit=True)
- unsloth/Llama-3.2-1B-Instruct + num_labels=3
- unsloth/ModernBERT-large classifier head (bert_classification notebook)
- Regression: causal LM path unchanged, backbone still 4-bit
- 3-step SFT on num_labels=3 confirms gradient flow and weight updates
  on score.weight

Fixes unslothai/unsloth#5027

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 05:16:33 -07:00
David Solanas Sanz
1fcb2502cf
fix: prevent offline freeze by fixing stats retry and forwarding local_files_only (#5016)
Fixes #2393.

- `_utils.py`: `has_internet()` now respects `HF_HUB_OFFLINE` with truthy variant parsing in addition to `TRANSFORMERS_OFFLINE`.
- `_utils.py`: replace uncontrolled `except Exception: stats_check()` retry (which had no time limit and could freeze on Kaggle offline mode) with a logged skip.
- `loader.py`: forward `local_files_only` from kwargs into all `AutoConfig.from_pretrained` and `PeftConfig.from_pretrained` probes in `FastLanguageModel.from_pretrained` and `FastModel.from_pretrained`, including the PEFT base-model reload paths.
2026-04-15 04:51:31 -07:00
Lee Jackson
f9ef639dde
Studio: support GGUF variant selection for non-suffixed repos (#5023)
* fix: support GGUF variant selection for non-suffixed repos

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: harden GGUF detection across cached models and picker flows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chore: use shared GGUF picker helper for search rows

* fix: avoid mixed cache duplication and preserve GGUF fallback detection

* fix: unify GGUF cache matching and merge picker hints

* fix: normalize local GGUF matching across picker and model config

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: robust cached-gguf classification + hint-aware click routing

- _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback
  by commit_hash so partial/interrupted downloads don't TypeError out of
  sum() and wipe the entire cached list.
- list_cached_gguf / list_cached_models: narrow per-repo try/except so
  one malformed repo no longer poisons the whole response.
- handleModelClick: route through isKnownGgufRepo instead of the
  suffix-only isGgufRepo, so non-suffixed GGUF repos still open the
  variant expander from every call site.
- Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known
  GGUF ids to stop conflating "no hint" with "known not-GGUF".
- Make HfModelResult.isGguf required (it is always set in makeMapModel).
- Add regression tests for the None size case, mixed-repo inclusion in
  cached-gguf, and per-repo error isolation.

* fix: exclude mmproj from GGUF classification and case-normalize hint lookups

- _repo_gguf_size_bytes now filters mmproj vision-adapter files so
  safetensors+mmproj.gguf repos stay on the cached-models path and
  non-GGUF rows no longer show zero pickable variants. A vision-capable
  GGUF repo (main weight + mmproj adapter) still classifies as GGUF and
  reports the main weight size.
- modelGgufIds / resultGgufIds now key on lowercased ids and
  isKnownGgufRepo lowercases its lookup, so store and HF-search ids
  that differ only by casing still match the same GGUF hint.
- New regression tests: mmproj-only repo excluded from cached-gguf,
  same repo included in cached-models, vision-capable repo still
  classified as GGUF with correct size.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-15 15:32:01 +04:00
Roland Tannous
13928b5f0e
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024)
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var

When set, UNSLOTH_PYTORCH_MIRROR overrides the default
https://download.pytorch.org/whl base URL in all four install scripts
(install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py).
When unset or empty, the official URL is used. This lets users behind
corporate proxies or in regions with poor connectivity to pytorch.org
point at a local mirror without patching scripts.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py

Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back
to the official URL when unset or empty, and preserves the value as-is
(including trailing slashes).

* Remove stale test assertions for missing install.sh messages

* Fix GPU mocking in test_get_torch_index_url.sh

Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside
get_torch_index_url so the GPU-presence checks work in tests.
Add -L flag handling to mock nvidia-smi so it passes the GPU listing
check. All 26 tests now pass on CPU-only machines.

* Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 11:39:11 +04:00
Datta Nimmaturi
826c98f3c0
[moe][gemma4] Target MoE for gemma4 (#4913)
* Target MoE for gemma4

* refactor attention impl determine

* Revert "refactor attention impl determine"

This reverts commit 888fca08110a9a74278dc1ebc14d0da043bbd11d.

* Remove attention policy changes from gemma4 MoE fix
2026-04-14 16:53:07 -05:00
Daniel Han
5aa8c15246
Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021)
* Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast

llama-server's default behavior when the KV cache fills is to silently
drop the oldest non-``n_keep`` tokens and keep generating. The UI has
no way to tell the user that earlier turns were evicted -- they just
see degraded continuity and a confusing ``5,361 / 4,096`` on the
context usage bar.

Launch llama-server with ``--no-context-shift`` so it returns a clean
error once the request would exceed ``n_ctx``. In the chat adapter,
catch the error, identify it as a context-limit error via
``isContextLimitError()``, and surface a dedicated toast that names
the exact control to adjust: the ``Context Length`` field in the chat
Settings panel.

Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage
crosses 85%, so users see the "raise Context Length in Settings"
suggestion before they hit the hard stop.

Tests:

  * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift``
    flag in the static launch-command template, and pins it inside the
    unconditional ``cmd = [ ... ]`` block so a future refactor can't
    hide it behind a branch.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Shorten --no-context-shift comment to 1 line

* Match backend _friendly_error rewrite in isContextLimitError

Codex review on PR caught that ``backend/routes/inference.py::_friendly_error``
rewrites the raw llama-server text
  "request (X tokens) exceeds the available context size (Y tokens)"
into
  "Message too long: X tokens exceeds the Y-token context window. ..."
on the main streaming GGUF path. The heuristic only looked for
"context size" / "exceeds the available context" / "context shift",
none of which survive the rewrite, so the new "Context limit reached"
toast would never fire for the most common case. Add matches for
"message too long" and "context window" so both wordings hit.

Also addresses Gemini feedback on the launch-flag test:
  * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of
    reading ``__file__`` directly; scopes the assertions to the
    function that actually launches llama-server.
  * Replace the hardcoded ``"            ]"`` indent search with a
    line-at-a-time scan for a line that is just ``]``, so the test
    survives reformatting.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 10:58:20 -07:00
Daniel Han
5861a7ce15
Studio: split model-load progress label across two rows (#5020)
* Studio: split model-load progress label across two rows

The chat flow and training overlay both compose a progress label like
"112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the
percent badge in a single flex row. Once the rate + ETA part shows up,
the label outgrows the row width and wraps mid-phrase, orphaning the
percent ("19 left %") onto a second ragged line.

Fix in model-load-status.tsx: split the label on the first " • " into
a primary (size) chunk that stays on row 1 with the percent, and a
secondary (rate/ETA) chunk that renders on its own muted row below.
Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly
to one row. The inline-status variant keeps only the primary and
surfaces the full label via the tooltip.

Also extracts the rate/ETA math out of useTransferStats into a pure
``transfer-stats.ts`` module (appendSample + computeTransferStats) so
it can be reasoned about and tested without React. The hook is now a
thin wrapper that feeds sample history through the pure functions.

Backend: adds two companion test files for load_progress():

  * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform
    matrix (Linux /proc, macOS/Windows absence), VmRSS parsing
    variants (tab/space/missing/malformed), filesystem edges (HF-cache
    symlinks, broken symlinks, nonexistent paths, relative paths),
    shard aggregation (partial multi-shard, two series in same dir,
    mmproj-* exclusion, single-file), lifecycle races, concurrent
    sampling (10 threads x 50 iters against real /proc), fraction
    bounds.
  * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live
    integration: real subprocess allocating 100 MB to match VmRSS,
    real ready phase, real dead-pid degradation, real shard
    aggregation, repeated polling. Skipped on non-Linux.

Both complement the existing test_llama_cpp_load_progress.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Hoist splitProgressLabel out of JSX IIFE (review feedback)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 10:58:16 -07:00
Eda Z
5b8dbdc3c2
Fix bitsandbytes ROCm install by using pip instead of uv (#4966)
* Fix bitsandbytes ROCm install by using pip instead of uv

* Also use pip for PyPI fallback path in _install_bnb_rocm

The original fix correctly switched the pre-release wheel install from
uv to pip, but left the PyPI fallback path on uv. If uv breaks bnb
on ROCm, the fallback would hit the same issue. Move pip bootstrap
before the branch so both paths use pip consistently.

* Harden pip bootstrap: try ensurepip first, warn on failure

- Try ensurepip --upgrade before falling back to uv pip install pip.
  ensurepip works offline and does not need PyPI, making the bootstrap
  robust when the network or index is unavailable.
- If both ensurepip and uv fail, emit a visible warning instead of
  silently swallowing the error (which previously led to a cryptic
  "No module named pip" downstream).
- Use run_maybe_quiet so --verbose users see bootstrap output.
- Update comment to document the actual root cause: uv rejects the
  wheel because filename version and metadata version disagree.

* Add --isolated to pip install calls in _install_bnb_rocm

uv pip install ignores pip.conf and PIP_* env vars, but python -m pip
reads them. Without --isolated, users with PIP_INDEX_URL pointing to a
private mirror that does not carry bitsandbytes would see the PyPI
fallback fail where it previously worked under uv. --isolated restores
parity with the old uv behavior.

* Drop --isolated from PyPI fallback in _install_bnb_rocm

--isolated suppresses PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, and pip.conf.
This is correct for the pre-release path (hardcoded GitHub URL, no index
consulted), but breaks the PyPI fallback for users in corporate or
air-gapped environments whose only route to bitsandbytes is a private
mirror configured via those mechanisms. Keep --isolated on the direct-URL
pre-release install; drop it from the index-dependent fallback.

* Drop --isolated from pre-release pip install, fix warning wording

--isolated suppresses pip.conf cert/proxy/CA settings in addition to
index config. For the direct GitHub URL, index config is irrelevant but
cert/proxy settings matter in corporate SSL-inspection environments.
Without this fix, users with pip.conf-based CA bundles get a TLS error
on the pre-release download and silently fall back to the broken PyPI
version -- the exact outcome the PR is trying to prevent.

Also fix the fallback warning: "unreachable" is too specific since the
pre-release install can fail for reasons other than network reachability.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-14 10:23:40 -07:00
pre-commit-ci[bot]
a0b9d14081
[pre-commit.ci] pre-commit autoupdate (#5004)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.9 → v0.15.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.9...v0.15.10)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:49:18 -07:00
Daniel Han
bb14ab144a
Studio: live model-load progress + rate/ETA on download and load (#5017)
* Studio: live model-load progress + rate/ETA on download and load

Two UX fixes for the opaque multi-minute wait between clicking Load
and being able to chat, visible most clearly on large MoE GGUFs like
MiniMax-M2.7 (131 GB of weights on a 97 GB GPU):

1. **Model-load phase is now observable.** The existing chat flow
   transitions the toast to "Starting model..." as soon as the
   download hits 100%, then shows a spinner with no other feedback
   until llama-server reports healthy. For a 130 GB model that spinner
   freezes for five-plus minutes while the kernel pages shards into
   the page cache. A new `GET /api/inference/load-progress` endpoint
   samples `/proc/<pid>/status VmRSS` on the llama-server subprocess
   against the sum of shard file sizes on disk, so the UI can render
   a real bar plus rate / ETA during that window.

2. **Rate and ETA on downloads and loads.** Both the chat toast and
   the training-start overlay used to show a static pair of numbers
   (for example "15.4 of 140.8 GB"). A rolling 15-second window over
   the existing byte-series now surfaces "85.3 MB/s, 24m 23s left"
   beside that pair. The estimator is shared between the download
   and load phases so the numbers don't reset when the phase flips.

Also fixes a pre-existing assignment bug uncovered while wiring this
up: `load_model` was storing the caller's `gguf_path` kwarg into
`self._gguf_path`, which is `None` on the HF-download code path. The
resolved on-disk path (`model_path`) is what llama-server actually
mmaps; downstream consumers need that. No existing reader used
`_gguf_path`, so this is a correctness fix for the new endpoint.

- Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model.
- Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay.
- Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:46:22 -07:00
Roland Tannous
514bb3a20e
studio: pin peft to 0.18.1 to fix export subprocess issues (#5015)
* studio: pin peft to 0.18.1 to fix export subprocess issues

peft 0.19.0 causes export subprocess shutdown failures in Studio.
Reverting to 0.18.1 resolves the issue.

* studio: move peft pin to extras-no-deps to prevent torch upgrade

Installing peft via overrides.txt would resolve its deps and pull in
torch>=0.11.0, breaking other pinned packages. Moving the pin to
extras-no-deps.txt ensures --no-deps is used during install.
2026-04-14 20:16:30 +04:00
Datta Nimmaturi
4328d0b4f6
Fix num_items_in_batch GA for Gemma4 (#4998)
* Fix num_items_in_batch GA for Gemma4

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:01:10 -07:00
Daniel Han
7252410ccc
studio: stream export worker output into the export dialog (#4897)
* studio: stream export worker output into the export dialog

The Export Model dialog only showed a spinner on the "Exporting..."
button while the worker subprocess was doing the actual heavy lifting.
For Merged to 16bit and GGUF / Llama.cpp exports this meant several
minutes (or more, for large models) of opaque silence, with no way to
tell whether save_pretrained_merged, convert_hf_to_gguf.py, or
llama-quantize was making progress.

This adds a live terminal-style output panel inside the export dialog,
rendered just above the Cancel / Start Export buttons and scrollable
with auto-follow-tail. It shows stdout and stderr from both the worker
process itself and any child process it spawns (GGUF converter,
llama-quantize), coloured by stream.

Backend

- core/export/worker.py: new _setup_log_capture(resp_queue) installed
  before LogConfig.setup_logging. It saves the original stdout/stderr
  fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so
  every child process inherits the redirected fds), and spins up two
  daemon reader threads. Each thread reads bytes from a pipe, echoes
  them back to the original fd (so the server console keeps working),
  splits on \n and \r, and forwards each line to the resp queue as
  {"type":"log","stream":"stdout|stderr","line":...,"ts":...}.
  PYTHONUNBUFFERED=1 is set so nested Python converters flush
  immediately.

- core/export/orchestrator.py:
  - Thread-safe ring buffer (collections.deque, maxlen 4000) with a
    monotonically increasing seq counter. clear_logs(),
    get_logs_since(cursor), get_current_log_seq(), is_export_active().
  - _wait_response handles rtype == "log" by appending to the buffer
    and continuing the wait loop. Status messages are also surfaced as
    a "status" stream so users see high level progress alongside raw
    subprocess output.
  - load_checkpoint, _run_export, and cleanup_memory now wrap their
    bodies with the existing self._lock (previously unused), clear the
    log buffer at the start of each op, and flip _export_active in a
    try/finally so the SSE endpoint can detect idle.

- routes/export.py:
  - Wrapped every sync orchestrator call (load_checkpoint,
    cleanup_memory, export_merged_model, export_base_model,
    export_gguf, export_lora_adapter) in asyncio.to_thread so the
    FastAPI event loop stays free during long exports. Without this
    the new SSE endpoint could not be served concurrently with the
    blocking export POST.
  - New GET /api/export/logs/stream SSE endpoint. Honors
    Last-Event-ID and a since query param for reconnect, emits log /
    heartbeat / complete / error events, uses the id field to carry
    the log seq so clients can resume cleanly. On first connect
    without an explicit cursor it starts from the current seq so old
    lines from a previous run are not replayed.

Frontend

- features/export/api/export-api.ts: streamExportLogs() helper that
  authFetches the SSE endpoint and parses id / event / data fields
  manually (same pattern as streamTrainingProgress in train-api.ts).

- features/export/components/export-dialog.tsx:
  - Local useExportLogs(exporting) hook that opens the SSE stream on
    exporting transitions to true, accumulates up to 4000 lines in
    component state, and aborts on cleanup.
  - New scrollable output panel rendered above DialogFooter, only
    shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is
    a fast disk write with nothing to show). Dark terminal styling
    (bg-black/85, emerald text, rose for stderr, sky for status),
    max-height 14rem, auto-scrolls to the bottom on new output but
    stops following if the user scrolls up. A small streaming / idle
    indicator is shown next to the panel title.
  - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the
    output panel is visible so the logs have room to breathe.

Verified

- Python smoke test (tests/smoke_export_log_capture.py): spawns a
  real mp.get_context("spawn") process, installs _setup_log_capture,
  confirms that parent stdout prints, parent stderr prints, AND a
  child subprocess invoked via subprocess.run (both its stdout and
  stderr) are all captured in the resp queue. Passes.
- Orchestrator log helpers tested in isolation: _append_log,
  get_logs_since (with and without a cursor), clear_logs not
  resetting seq so reconnecting clients still progress. Passes.
- routes.export imports cleanly in the studio venv and /logs/stream
  shows up in router.routes.
- bun run build: tsc -b plus vite build, no TypeScript errors.

No existing export behavior is changed. If the subprocess, the SSE
endpoint, or the frontend hook fails, the export itself still runs to
completion the same way it did before, with or without logs visible.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* export dialog: trim bootstrap noise, scope logs per screen, show realpath

Several follow-ups to the live export log work:

1. Worker bootstrap noise (transformers venv activation, Unsloth banner,
   "Top GGUF/hub models" lists, vision detection, 2k-step weight load
   bar) is dropped from the export-dialog stream. A threading.Event
   gate in worker.py defaults closed and only opens once _handle_export
   actually starts; until then the reader thread still echoes lines to
   the saved console fd for debugging but does not push them onto the
   resp_queue. The orchestrator already spawns a fresh subprocess for
   every checkpoint load, so the gate is naturally reset between runs.

2. tqdm in non-tty mode defaults to a 10s mininterval, which makes
   multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5
   in the worker env so any tqdm-driven progress emits more often.

3. The dialog's useExportLogs hook now also clears its line buffer
   when exportMethod or open changes, so re-opening the dialog into a
   different action's screen no longer shows the previous action's
   saved output. A useElapsedSeconds tick + "Working Xs" badge in the
   log header gives users a visible sign that long single-step phases
   (cache copies, GGUF conversion) are still running when no new lines
   are arriving.

4. ExportBackend.export_{merged,base,gguf,lora} now return
   (success, message, output_path); the worker forwards output_path on
   each export_*_done response, the orchestrator's _run_export passes
   it to routes/export.py, which surfaces it via
   ExportOperationResponse.details.output_path. The dialog's Export
   Complete screen renders the resolved on-disk realpath under "Saved
   to" so users can find their exported model directly.

* fix(cli): unpack 3-tuple return from export backend

ExportOrchestrator.export_{merged,base,gguf,lora} now return
(success, message, output_path) so the studio dialog can show
the on-disk realpath. The CLI still unpacked 2 values, so every
`unsloth export --format ...` crashed with ValueError before
reporting completion. Update the four call sites and surface
output_path via a "Saved to:" echo.

* fix(studio): anchor export log SSE cursor at run start

The export dialog SSE defaulted its cursor to get_current_log_seq()
at connect time, so any line emitted between the POST that kicks
off the export and the client opening the stream was buffered with
seqs 1..k and then skipped (seq <= cursor). Long-running exports
looked silent during their first seconds.

Snapshot _log_seq into _run_start_seq inside clear_logs() and
expose it via get_run_start_seq(). The SSE default cursor now uses
that snapshot, so every line emitted since the current run began
is reachable regardless of when the client connects. Old runs
still can't leak in because their seqs are <= the snapshot.

* fix(studio): reconnect export log SSE on stream drop

useExportLogs launched streamExportLogs once per exporting
transition and recorded any drop in .catch(). Long GGUF exports
behind a proxy with an idle kill-timeout would silently lose the
stream for the rest of the run even though the backend already
supports Last-Event-ID resume. The "retry: 3000" directive emitted
by the backend is only meaningful to native EventSource; this
hook uses a manual fetch + ReadableStream parse so it had no
effect.

Wrap streamExportLogs in a retry loop that tracks lastSeq from
ExportLogEvent.id and passes it as since on reconnect. Backoff is
exponential with jitter, capped at 5s, reset on successful open.
The loop stops on explicit backend `complete` event or on effect
cleanup.

* fix(studio): register a second command so Typer keeps `export` as a subcommand

The CLI export unpacking tests wrap `unsloth_cli.commands.export.export`
in a fresh Typer app with a single registered command. Typer flattens a
single-command app into that command, so the test's
`runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading
`"export"` token as an unexpected extra positional argument -- every
parametrized case failed with:

    Got unexpected extra argument (.../out)

Register a harmless `noop` second command so Typer preserves subcommand
routing and the tests actually exercise the 3-tuple unpack path they
were written to guard.

Before: 4 failed
After:  4 passed

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: studio-install <studio@local.install>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 08:55:43 -07:00
Daniel Han
eca592effe
studio: show HF model download progress in training start overlay (#4894)
* studio: show HF model download progress in training start overlay

During the training setup phase, the overlay only displayed a static
"Loading model..." line while model weights were being downloaded from
Hugging Face. On slow connections this looked like the app had frozen.

This adds a small self-contained progress block inside the existing
TrainingStartOverlay that polls the existing
GET /api/models/download-progress endpoint and renders a Progress bar
with bytes downloaded, total bytes, and percent complete.

Notes:

- Frontend only change. No backend, worker, SSE, or runtime store edits.
- Reuses the existing getDownloadProgress client wrapper and the
  existing /api/models/download-progress endpoint that already scans
  the HF blob cache for completed and .incomplete files.
- selectedModel is read directly from useTrainingConfigStore inside the
  overlay, so no prop drilling and live-training-view.tsx is unchanged.
- Polling runs at 1500 ms and is gated on the HF repo regex
  (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses,
  so local paths and empty form state never hit the endpoint.
- Polling stops once progress reaches 1.0 so the bar can stay at 100
  until the overlay hides on the first training step.
- Network errors are silently swallowed, matching the chat side flow
  (the bar simply freezes at the last value).
- When downloadedBytes is 0 the block is hidden entirely, so cached
  models do not flash a progress bar.
- When the HF API cannot determine the total size, the block falls
  back to "X downloaded" with no percent and no bar.

Verified with bun run build (tsc -b plus vite build, no TypeScript
errors).

* training overlay: track dataset download + show on-disk realpath

Adds a dedicated "Downloading dataset..." section to the training-start
overlay alongside the existing model-weights one, so an HF dataset that
is downloading mid-startup is no longer mislabeled as model weights or
hidden entirely. The new GET /api/datasets/download-progress endpoint
mirrors /api/models/download-progress against the datasets-- prefix in
HF_HUB_CACHE.

Both endpoints now also return cache_path, the resolved on-disk
realpath of the snapshot directory (or the cache repo root if no
snapshot is materialized yet). The overlay surfaces this under each
download row so users can immediately see where the model and dataset
landed without digging through server logs.

The frontend's existing useModelDownloadProgress hook is generalized
to a single useHfDownloadProgress(repoId, fetcher) hook that the
model and dataset variants both delegate to, keeping polling, gating,
and completion semantics in one place.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: Polish training start overlay download progress UI (#4957)

* studio: polish training start overlay download progress visuals

* Fix formatCachePath cross-platform support and redundant sizeLabel

- Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~
- Suppress sizeLabel when no byte info is available (cachePath-only state),
  since the "Preparing" badge already conveys the status

* Fix misleading status badge when download total is unknown

- Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot
  determine if the download is still in progress or already complete (happens
  when HF size metadata lookup fails for gated/private repos)
- Keep "Preparing" badge for the zero-bytes cachePath-only state
- Add Windows native path shortening to formatCachePath (C:\Users\<name>)

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: studio-install <studio@local.install>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-14 08:54:01 -07:00
Daniel Han
44082cf88e
Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014)
* Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM

The chat settings sheet's ctx slider reads `max_context_length` from
`/api/inference/status` and renders

    Exceeds estimated VRAM capacity (N tokens). The model may use
    system RAM.

when the user drags the slider above that value. For models whose
weights fit on some GPU subset, `_max_context_length` was already set
to the binary-search cap and the warning fired correctly.

For models whose weights exceed 90% of every GPU subset's free memory
(e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe
loop never matched a subset, so `max_available_ctx` stayed at the
native context (e.g. 196608). The slider ran all the way to native
with no indication that any value above the 4096 spec default would
trigger `--fit on` and degrade performance.

Anchor `max_available_ctx` at `min(4096, native_context_length)` when
no subset fits, so the warning fires at the right threshold and the
user sees the correct safe-zone / warning-zone split:

    Before (MiniMax-M2.7 on 97 GB GPU):
      slider 0 .. 196608, warning threshold = 196608  (never fires)

    After:
      slider 0 .. 196608, warning threshold = 4096    (fires correctly)

No frontend changes required: `chat-settings-sheet.tsx` already
consumes `ggufMaxContextLength` (= status.max_context_length) as the
warning threshold and `ggufNativeContextLength` as the slider max.

Adds tests/test_llama_cpp_max_context_threshold.py covering
weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096
fallback case (don't lie about supported ctx), fittable-model
regressions (small / multi-GPU / tiny on huge GPU), and the
`max_context_length` property's fallback semantics.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:49 -07:00
Daniel Han
b2f80f210e
Studio: make GGUF disk-space preflight cache-aware (#5012)
* Studio: make GGUF disk-space preflight cache-aware

The pre-download disk check in LlamaCppBackend.load_model compared the
repo's total GGUF size against free disk without crediting bytes
already present in the Hugging Face cache. Re-loading a large cached
model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with
"Not enough disk space to download any variant" whenever free disk
was below the full weight footprint, even though nothing actually
needed to be downloaded.

Subtract bytes already on disk via try_to_load_from_cache before
comparing against free space. A partial blob (interrupted download) is
not credited, so a second attempt still allocates room to finish the
download. The log line now also surfaces how much is already cached.

Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the
fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk,
cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse
tempfiles keep the GB-scale scenarios cheap to simulate.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:37 -07:00
Daniel Han
767fa8cade
Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011)
* Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM

The load-time auto-fit in LlamaCppBackend.load_model had two issues for
models whose weights do not fit on any GPU subset (the common case for
large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.):

1. Auto mode (max_seq_length=0) left effective_ctx at the model's native
   context when no subset passed the 90% fit check. The UI slider then
   landed on e.g. 196608 for MiniMax-M2.7, far above anything usable.
   Default the auto-pick to 4096 so the UI starts at a sane value; the
   slider ceiling stays at the native context so the user can still
   opt in to longer contexts and receive the "might be slower" warning.

2. Explicit ctx was silently shrunk when weights fit but the requested
   KV overflowed the 90% budget. The shrink loop emitted -c <capped>
   -ngl -1 without informing the caller, so a user who had opted into
   a longer context via the UI never actually got it. Drop the shrink
   loop on the explicit path and emit -c <user_ctx> --fit on instead,
   letting llama-server flex -ngl (CPU layer offload).

Adds tests/test_llama_cpp_context_fit.py covering both paths, the
file-size-only fallback when KV metadata is missing, non-regression on
fittable auto-pick, and platform-agnostic input shape.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:25 -07:00
TF-MTGE
a31c82a640
fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922)
* fix: increase wait response timeout to 900 sec instead of 300 sec. #4845

* Apply suggestion from @gemini-code-assist[bot]

good catch

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-14 08:53:14 -07:00
Datta Nimmaturi
da78c6be71
[Studio] Install flash attn at setup time for linux (#4979)
* [Studio] Install flash attn at setup time for linux

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup changes

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Test cases

* wheel_utils: narrow url_exists exceptions and log at debug level

---------

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 16:40:17 +04:00
Datta Nimmaturi
dccc0ebada
[Studio] Show non exported models in chat UI (#4892)
* Show non exported models in chat UI

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Distinguish b/w LoRa and full fine tune saves. Cleanup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-14 15:03:58 +04:00
Bharath Kumar Adinarayan
a50f61009b
fix(studio): default chart view to full training history (#5007)
* fix(studio): default chart view to full training history instead of last 80 steps

Fixes #5003

* chore: windowsize as null code comment

---------

Co-authored-by: imagineer99 <samleejackson0@gmail.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
2026-04-14 03:29:27 -07:00
Lee Jackson
bfa17330bd
Studio: Polish API key copy button and harden async clipboard fallback (#5006)
* fix: polish clipboard style and fix async clipboard path

* Use copyToClipboardAsync in CopyButton for Safari fallback

CopyButton was calling navigator.clipboard.writeText directly,
bypassing the execCommand fallback added in this same PR. Switch
to copyToClipboardAsync which tries execCommand first (Safari
user-gesture requirement) then falls back to the async clipboard API.

* Fix copyToClipboard sync contract regression and improve async path

- Restore copyToClipboard() to return only the execCommand result,
  preserving the boolean contract that 7 existing callers depend on
  to gate their "Copied!" UI state. The fire-and-forget async fallback
  was returning true before the promise resolved, causing false success.

- Add document.body null guard to copyWithExecCommand for SSR safety.

- Reorder copyToClipboardAsync to try the async Clipboard API first,
  avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs
  where execCommand always fails anyway.

* Restore queryCommandSupported guard and fix async catch path

- Restore the queryCommandSupported("copy") guard in copyToClipboard()
  to match the original contract exactly: when execCommand is entirely
  unsupported, fall through to fire-and-forget async clipboard write.

- Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText
  rejects, the user-gesture frame is gone, so execCommand will also fail.
  Return false from catch instead of falling through. The execCommand
  fallback at the bottom only runs when the Clipboard API is absent
  (still in user-gesture frame).

* Restore execCommand fallback in copyToClipboardAsync catch path

The catch block was returning false after clipboard API rejection,
based on the incorrect premise that the user-gesture frame is lost
after an await. Per the HTML spec, transient user activation IS
preserved through promise microtask chains. The real reason
execCommand fails in the Radix dialog is the focus trap intercepting
textarea.focus(), not gesture loss.

For non-dialog callers, execCommand can still succeed after a
clipboard rejection. Inside a Radix modal, execCommand returns
false harmlessly (focus trap blocks it).

* Harden textarea fallback for mobile and continue to async path on failure

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 14:22:14 +04:00
Wasim Yousef Said
97eafd999e
studio: fix api-keys access + refresh (#5005)
* studio: fix api-keys access + refresh

* studio: guard v1 in spa fallback
2026-04-13 23:48:51 +04:00
AdamPlatin123
d2fc582840
studio: skip training status/metrics polling when idle (#4988)
* fix(studio): skip training status/metrics polling when idle

Add an early return in the status and metrics setInterval callbacks when
the runtime store reports phase === "idle" and hasHydrated is true.
Previously these polls fired unconditionally every 3s/5s, generating
unnecessary network traffic and console errors when no training was
running.

* fix(studio): reduce idle polling to 30s instead of stopping entirely

Review feedback (PR #4988): completely stopping polling when idle risks
permanent UI desync if hydration fails, and misses out-of-band state
changes from other clients. Add a 30s background poll that only fires
when idle to recover gracefully.

* fix: harden idle status polling around hydration and runtime reset

---------

Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: imagineer99 <samleejackson0@gmail.com>
2026-04-13 12:02:12 -07:00
Daniel Han
9a261aec5f
Studio: Expose openai and anthropic compatible external API end points (#4956)
* Studio: add API key authentication for programmatic access

External users want to hit the Studio API (chat completions with tool
calling, training, export, etc.) without going through the browser
login flow. This adds sk-unsloth- prefixed API keys that work as a
drop-in replacement for JWTs in the Authorization: Bearer header.

Backend:
- New api_keys table in SQLite (storage.py)
- create/list/revoke/validate functions with SHA-256 hashed storage
- API key detection in _get_current_subject before the JWT path
- POST/GET/DELETE /api/auth/api-keys endpoints on the auth router

Frontend:
- /api-keys page with create form, one-time key reveal, keys table
- API Keys link in desktop and mobile navbar
- Route registered with requireAuth guard

Zero changes to any existing route handler -- every endpoint that uses
Depends(get_current_subject) automatically works with API keys.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Use actual origin in API key usage examples

The examples on /api-keys were hardcoded to localhost:8888 which is
wrong for remote users. Use window.location.origin so the examples
show the correct URL regardless of where the user is connecting from.

* Add `unsloth studio run` CLI command for one-liner model serving

Adds a `run` subcommand that starts Studio, loads a model, creates an
API key, and prints a ready-to-use curl command -- similar to
`ollama run` or `vllm serve`.

Usage: unsloth studio run -m unsloth/Qwen3-1.7B-GGUF --gguf-variant UD-Q4_K_XL

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add end-to-end tests for `unsloth studio run` and API key usage

Tests the 4 usage examples from the API Keys page:
1. curl basic (non-streaming) chat completions
2. curl streaming (SSE) chat completions
3. OpenAI Python SDK streaming completions
4. curl with tools (web_search + python)

Also tests --help output, invalid key rejection, and no-key rejection.
All 7 tests pass against Qwen3-1.7B-GGUF.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add /v1/completions, /v1/embeddings, /v1/responses endpoints and --parallel support

- llama_cpp.py: accept n_parallel param, pass to llama-server --parallel
- run.py: plumb llama_parallel_slots through to app.state
- inference.py: add /completions and /embeddings as transparent proxies to
  llama-server, add /responses as application-level endpoint that converts
  to ChatCompletionRequest; thread n_parallel through load_model
- studio.py: set llama_parallel_slots=4 for `unsloth studio run` path

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Make /v1/responses endpoint match OpenAI Responses API format

The existing /v1/responses shim returned Chat Completions format, which
broke OpenAI SDK clients using openai.responses.create(). This commit
replaces the endpoint with a proper implementation that:

- Returns `output` array with `output_text` content parts instead of
  `choices` with `message`
- Uses `input_tokens`/`output_tokens` instead of `prompt_tokens`/
  `completion_tokens` in usage
- Sets `object: "response"` and `id: "resp_..."`
- Emits named SSE events for streaming (response.created,
  response.output_text.delta, response.completed, etc.)
- Accepts all OpenAI Responses API fields (tools, store, metadata,
  previous_response_id) without erroring -- silently ignored
- Maps `developer` role to `system` and `input_text`/`input_image`
  content parts to the internal Chat format

Adds Pydantic schemas for request/response models and 23 unit tests
covering schema validation, input normalisation, and response format.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: add Anthropic-compatible /v1/messages endpoint (#4981)

* Add Anthropic-compatible /v1/messages endpoint with tool support

Translate Anthropic Messages API format to/from internal OpenAI format
and reuse the existing server-side agentic tool loop. Supports streaming
SSE (message_start, content_block_delta, etc.) and non-streaming JSON.
Includes offline unit tests and e2e tests in test_studio_run.py.

* Add enable_tools, enabled_tools, session_id to /v1/messages endpoint

Support the same shorthand as /v1/chat/completions: enable_tools=true
with an optional enabled_tools list uses built-in server tools without
requiring full Anthropic tool definitions. session_id is passed through
for sandbox isolation. max_tokens is now optional.

* Strip leaked tool-call XML from Anthropic endpoint content

Apply _TOOL_XML_RE to content events in both streaming and
non-streaming tool paths, matching the OpenAI endpoint behavior.

* Emit custom tool_result SSE event in Anthropic stream

Adds a non-standard tool_result event between the tool_use block close
and the next text block, so clients can see server-side tool execution
results. Anthropic SDKs ignore unknown event types.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Split /v1/messages into server-side and client-side tool paths

enable_tools=true runs the existing server-side agentic loop with
built-in tools (web_search/python/terminal). A bare tools=[...] field
now triggers a client-side pass-through: client-provided tools are
forwarded to llama-server and any tool_use output is returned to the
caller with stop_reason=tool_use for client execution.

This fixes Claude Code (and any Anthropic SDK client) which sends
tools=[...] expecting client-side execution but was previously routed
through execute_tool() and failing with 'Unknown tool'.

Adds AnthropicPassthroughEmitter to convert llama-server OpenAI SSE
chunks into Anthropic SSE events, plus unit tests covering text
blocks, tool_use blocks, mixed, stop reasons, and usage.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix httpcore GeneratorExit in /v1/messages passthrough stream

Explicitly aclose aiter_lines() before the surrounding async with
blocks unwind, mirroring the prior fix in external_provider.py
(a41160d3) and cc757b78's RuntimeError suppression.

* Wire stop_sequences through /v1/messages; warn on tool_choice

Plumb payload.stop_sequences to all three code paths (server-side
tool loop, no-tool plain, client-side passthrough) so Anthropic SDK
clients setting stop_sequences get the behavior they expect. The
llama_cpp backend already accepted `stop` on both generate_chat_
completion and generate_chat_completion_with_tools; the Anthropic
handler simply wasn't passing it.

tool_choice remains declared on the request model for Anthropic SDK
compatibility (the SDK often sets it by default) but is not yet
honored. Log a structured warning on each request carrying a non-
null tool_choice so the silent drop is visible to operators.

* Wire min_p / repetition_penalty / presence_penalty through /v1/messages

Align the Anthropic endpoint's sampling surface with /v1/chat/completions.
Adds the three fields as x-unsloth extensions on AnthropicMessagesRequest
and threads them through all three code paths: server-side tool loop,
no-tool plain, and client-side passthrough.

The passthrough builder emits "repeat_penalty" (not "repetition_penalty")
because that is llama-server's field name; the backend methods already
apply the same rename internally.

* Fix block ordering and prev_text reset in non-streaming tool path

_anthropic_tool_non_streaming was building the response by appending
all tool_use blocks first, then a single concatenated text block at
the end — losing generation order and merging pre-tool and post-tool
text into one block. It also never reset prev_text between synthesis
turns, so the first N characters of each post-tool turn were dropped
(where N = length of the prior turn's final cumulative text).

Rewrite to build content_blocks incrementally in generation order,
matching the streaming emitter's behavior: deltas within a turn are
merged into the trailing text block, tool_use blocks interrupt the
text sequence, and prev_text is reset on tool_end so turn N+1 diffs
against an empty baseline.

Caught by gemini-code-assist[bot] review on #4981.

* Make test_studio_run.py e2e tests pytest-compatible

Add a hybrid session-scoped studio_server fixture in conftest.py that
feeds base_url / api_key into the existing e2e test functions. Three
invocation modes are now supported:

1. Script mode (unchanged) — python tests/test_studio_run.py
2. Pytest + external server — point at a running instance via
   UNSLOTH_E2E_BASE_URL / UNSLOTH_E2E_API_KEY env vars, no per-run
   GGUF load cost
3. Pytest + fixture-managed server — pytest drives _start_server /
   _kill_server itself via --unsloth-model / --unsloth-gguf-variant,
   CI-friendly

The existing _start_server / _kill_server helpers and main() stay
untouched so the script entry point keeps working exactly as before.
Test function signatures are unchanged — the (base_url, api_key)
parameters now resolve via the new fixtures when running under
pytest.

* Rename test_studio_run.py -> test_studio_api.py

The file is entirely about HTTP API endpoint testing (OpenAI-compatible
/v1/chat/completions, Anthropic-compatible /v1/messages, API key auth,
plus a CLI --help sanity check on the command that runs the API). None
of its tests cover training, export, chat-UI, or internal-Python-API
concerns.

The old name misleadingly suggested "tests for the unsloth studio run
CLI subcommand" — the new name reflects the actual scope.

Updates:
- git mv the file (rename tracked, history preserved)
- Rewrite opening docstring to state the API surface focus and call
  out what is explicitly out of scope
- Update all 4 Usage-block path references to the new filename
- LOG_FILE renamed to test_studio_api.log
- conftest.py fixture import rewritten from test_studio_run to
  test_studio_api, plus 7 docstring/comment references updated

No functional changes to test logic, signatures, or main().

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* Fix httpcore asyncgen cleanup in /v1/messages and /v1/completions

The earlier fix in 985e92a9 was incomplete: it closed aiter_lines()
explicitly but still used `async with httpx.AsyncClient()` /
`async with client.stream()` inside the generator. When the generator
is orphaned (e.g. client disconnects mid-stream and Starlette drops
the StreamingResponse iterator without explicitly calling aclose()),
Python's asyncgen finalizer runs the cleanup in a DIFFERENT task than
the one that originally entered the httpx context managers. The
`async with` exits then trigger httpcore's HTTP11ConnectionByteStream
.aclose(), which enters anyio.CancelScope.__exit__ with a mismatched
task and raises RuntimeError("Attempted to exit cancel scope in a
different task"). That error escapes any user-owned try/except
because it happens during GC finalization.

Replace `async with` with manual client/response lifecycle in both
/v1/messages passthrough and /v1/completions proxy. Close the
response and client in a finally block wrapped in
`try: ... except Exception: pass`. This suppresses RuntimeError (and
other Exception subclasses) from the anyio cleanup noise while
letting GeneratorExit (a BaseException, not Exception) propagate
cleanly so the generator terminates as Python expects.

Traceback observed in user report:
  File ".../httpcore/_async/connection_pool.py", line 404, in __aiter__
      yield part
  RuntimeError: async generator ignored GeneratorExit
...
  File ".../anyio/_backends/_asyncio.py", line 455, in __exit__
      raise RuntimeError(
  RuntimeError: Attempted to exit cancel scope in a different task

* Expand unsloth studio run banner with SDK base URL and more curl examples

Add an explicit "OpenAI / Anthropic SDK base URL" line inside the info
box so SDK users don't accidentally copy the bare server URL (without
/v1) into their OpenAI/Anthropic SDK constructors and hit 404s.

Replace the single /v1/chat/completions curl example with three
labeled blocks: chat/completions, Anthropic /messages, and OpenAI
Responses. The Anthropic example includes max_tokens (Anthropic SDKs
require it even though Studio accepts None).

All examples derived from a computed sdk_base_url so the /v1 prefix
stays in sync if the public path ever changes.

* Hash API keys with HMAC-SHA256 + persistent server secret

Stores the HMAC secret in a new app_secrets singleton table. Fixes
CodeQL py/weak-sensitive-data-hashing alert on storage.py:74-76,
394-395. Refresh tokens stay on plain SHA-256 (unchanged _hash_token)
so existing user sessions survive upgrade — API keys are new on this
branch so there is no migration.

* Use PBKDF2 for API key hashing per CodeQL recommendation

HMAC-SHA256 was still flagged by py/weak-sensitive-data-hashing.
Switch to hashlib.pbkdf2_hmac, which is in CodeQL's recommended
allowlist (Argon2/scrypt/bcrypt/PBKDF2). Persistent server-side
salt stays in app_secrets for defense-in-depth. 100k iterations to
match auth/hashing.py's password hasher.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-13 21:08:11 +04:00
Roland Tannous
3bb72a557f
Pin kernels==0.12.1 to avoid huggingface_hub dataclass conflict (#5000) 2026-04-13 20:42:02 +04:00
Lee Jackson
21a7895959
Studio: Prompt manager, message deletion, and chat UI improvements (#4938)
* feat(chat): code block styling, delete with Dexie sync, settings sheet polish

* style: config save/delete padding fix

* fix(studio): centralize dark code-block surface and optimize message sync writes

* style: config padding/alignment polish

* fix(studio): upsert custom presets without implicit rename-delete

* fix settings sheet save state polish

* fix settings sheet button widths

* fix chat settings presets

* fix chat delete sync

* fix chat trust remote code flow

---------

Co-authored-by: shine1i <wasimysdev@gmail.com>
2026-04-13 16:42:33 +02:00
AdamPlatin123
3b092bcd46
fix(studio): prevent route transition DOM duplication via AnimatePresence (#4987)
Add mode="wait" and exit={{ opacity: 0 }} to the root AnimatePresence
wrapper so outgoing routes fully unmount before incoming routes render.
Without this, rapid navigation between Studio/Export/Recipes/Chat caused
pages to stack (2x–3x duplication).

Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
2026-04-13 01:38:00 -07:00
Manan Shah
80c12ff1a6
Move gemma4 script (#4994)
* updating gemma4 script

* moving gemma4 script to scripts folder
2026-04-12 23:41:15 -07:00
Manan Shah
db3b3a4d9b
updating gemma4 script (#4992)
* updating gemma4 script

* show errors
2026-04-12 23:11:32 -07:00
Daniel Han
93a24f6698
Add ROCm test suite for PR #4720 (#4824)
95 Python tests and 23 shell tests covering ROCm detection,
torch index URL selection, hardware flags, prebuilt asset selection,
and install pathway logic. All tests use mocks -- no AMD hardware required.

Companion to #4720 (AMD ROCm/HIP support).
2026-04-11 04:44:13 -07:00
Daniel Han
53af4a1b3e
Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+ (#4934)
* Fix Gemma-4 GRPO catastrophic KL divergence with TRL 1.0.0+

Two compounding bugs caused Gemma-4 GRPO training to diverge with KL ~10^12
at step 1 against TRL 1.0.0+. Both fixes are runtime patches in the existing
TRL/model patch flow and are no-ops for models and TRL versions that are not
affected.

Fix 1 (rl.py): replace trl.models.utils.disable_gradient_checkpointing with
a no-op context manager. TRL 1.0.0+ wraps generation in
`with torch.no_grad(), disable_gradient_checkpointing(self.model, ...):`
purely to suppress a cosmetic PyTorch warning ("None of the inputs have
requires_grad=True"). Inside torch.no_grad() the gradient checkpointing
state has no functional effect on the forward pass. On context exit, TRL
calls model.gradient_checkpointing_enable() which dispatches to HF's
generic implementation and overwrites Unsloth's custom
`use_gradient_checkpointing="unsloth"` wrapper, corrupting Gemma-4 forward
numerics. Replacing the toggle with a no-op preserves Unsloth's custom GC
wrapper across generation passes. The patch walks sys.modules dynamically
to also rebind the symbol on every trl.* module that already imported it
(grpo_trainer, dpo_trainer, rloo_trainer, dppo_trainer, gfpo_trainer,
grpo_with_replay_buffer_trainer, and any future trainer module).

Fix 2 (vision.py): inject `final_logit_softcapping` from `config.text_config`
into the top-level `model.config` for multimodal models. Unsloth's GRPO
trainer reads `getattr(model.config, "final_logit_softcapping", 0)` but
for Gemma-4 the attribute lives only on the nested `Gemma4TextConfig`,
so the lookup silently defaults to 0 instead of 30.

Backwards compatibility:
- trl 0.22.2: no `disable_gradient_checkpointing` symbol exists, the patch
  early-returns via `hasattr` guard.
- trl 0.27.1: same broken pattern as 1.0.0, the noop replacement is correct.
- trl 1.0.0+: end-to-end verified on `unsloth/gemma-4-E2B-it` GRPO with TRL
  1.0.0 and transformers 5.5.0. Step 1 loss=2.46e-08, kl=2.92e-05 (machine
  zero) vs broken baseline loss=1.37e+06, kl=1.76e+09.
- Llama / non-VLM text models: Fix 2 is a no-op (no `text_config`); Fix 1
  is functionally identical (Unsloth's GC wrapper is preserved).
- Qwen3-VL and other VLMs without final_logit_softcapping: Fix 2 is a no-op
  (text_config.final_logit_softcapping is None).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply loop 1 review fixes for PR #4934

- Move Fix 2 from vision.py to rl_replacements.py:858 and :1110 at the
  actual consumer sites. This avoids mutating model.config (which could
  leak into save_pretrained output) and covers text-only Gemma-4 paths
  that do not flow through FastBaseModel.from_pretrained.
- Revert the vision.py injection block entirely.
- Narrow the bare except blocks in patch_trl_disable_gradient_checkpointing
  from `except Exception:` to `(AttributeError, ImportError)` and
  `(AttributeError, TypeError)` to avoid masking unrelated bugs.
- Add logger.warning_once when the noop patch is installed, matching
  patch_trl_openenv and patch_trl_vllm_generation convention.
- Remove the dead per-module `_unsloth_noop_patched` sentinel check inside
  the sys.modules walk. The function-level early return already covers
  this case.
- Move `import sys` and `from contextlib import contextmanager` to the
  module-level imports instead of inside the function body.
- Rewrite the ordering comment in PatchFastRL to accurately describe
  why patch_trl_disable_gradient_checkpointing must run before
  patch_trl_rl_trainers.
- Fix keyword default spacing to match surrounding rl.py style.

End-to-end verified: Gemma-4-E2B GRPO on TRL 1.0.0 + transformers 5.5.0
step 1 loss=2.464e-08 kl=2.921e-05, all 5 steps succeed.

* Apply loop 2 review fix for PR #4934

Extract the final_logit_softcapping fallback logic into a shared helper
`_unsloth_get_final_logit_softcapping(config)` defined in rl_replacements.py
and injected into the compiled cache via RL_PRE_ITEMS["grpo_trainer"]. Both
call sites (`grpo_trainer__generate_and_score_completions` and
`grpo_trainer_compute_loss`) now use the helper instead of inlining the
same text_config fallback block twice.

Verified: compiled cache file lists the helper at module scope and both
consumer sites call it. Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05
(unchanged), all 5 steps pass.

* Apply loop 3 review fix for PR #4934

Extend _unsloth_get_final_logit_softcapping to also fall back to
config.get_text_config() for composite configs such as T5GemmaConfig
where the text sub-config is not exposed via the text_config attribute
but only via the get_text_config() method. Guard against (TypeError,
ValueError) raised by ambiguous composite configs, and skip the
self-referential case where get_text_config() returns self.

This addresses the 6/7 reviewer consensus from the third review loop.

Verified:
- Helper returns 30.0 for Gemma-4, T5Gemma, and Gemma 1/2 configs.
- Helper returns 0 for Llama, Qwen, Mistral, Cohere, Granite, and
  ambiguous configs raising ValueError.
- Gemma-4-E2B GRPO step 1 loss=2.464e-08 kl=2.921e-05 (unchanged).
- Llama-3.2-1B GRPO all 5 steps loss=0 kl=0 (no regression).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-10 07:58:15 -07:00
Daniel Han
65b4028560
Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) (#4954)
* Pin bitsandbytes to continuous-release_main on ROCm for 4-bit decode fix

bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on
every ROCm target:

  - CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a
    broken blocksize=32/64 warp64 GEMV kernel whose tests were
    explicitly skipped with ROCM_WARP_SIZE_64 guards because the
    code was known broken.
  - RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time
    BNB_WARP_SIZE macro in the host-side dispatch that resolves to
    64 when the multi-arch wheel is compiled with CDNA as the
    primary target, so num_blocks is wrong on RDNA and half the GEMV
    output is never written.

At decode shape (1, 1, hidden) both bugs produce NaN. Training is
unaffected because training shapes are (batch, seq_len > 1, hidden)
and never touch the GEMV path. The crash during autoregressive
inference surfaces as _assert_async_cuda_kernel in torch.multinomial
which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of
a clean Python error.

Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable
blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA",
PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a
runtime hipDeviceGetAttribute query and ships a working CDNA warp64
kernel. That commit has not shipped to PyPI yet, but
continuous-release_main wheels are published on every push to bnb
main via GitHub Releases.

Point the ROCm install path at the continuous-release_main x86_64 and
aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is
unreachable (offline installs, firewalled hosts, or architectures not
covered by the pre-release wheels). Drop the pin once bnb cuts a
0.50+ tag on PyPI.

Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct
bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1
(no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit
sampling generation works end-to-end.

NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is
gated on the ROCm torch index and platform.machine() respectively.

* Drop Studio ROCm 16-bit fallback now that bnb 0.50+ fixes 4-bit decode

The 16-bit fallback in studio/backend/core/inference/inference.py was
added as a workaround for a bug that this PR already fixes at the
install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel
on every ROCm target, which NaNs at decode shape (seq_len=1) and
crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in
0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this
PR) restores correct 4-bit decode on MI300X and verified working
end-to-end with full Unsloth + for_inference + sampling.

Revert the dual code path so ROCm and NVIDIA both go through the
normal FastLanguageModel.from_pretrained + for_inference flow:

  - Remove the conditional `from unsloth import` that skipped the
    import on ROCm. The monkey-patches it was trying to avoid were
    never the cause of the crash; bnb 4-bit GEMV was.
  - Remove the `if _hw_module.IS_ROCM:` branch in load_model that
    loaded with plain transformers + PEFT + bfloat16, and the
    `_resolve_fp16_base` helper it relied on.
  - Remove the `get_chat_template is not None` fallback in
    _load_chat_template_info -- get_chat_template is now always
    imported.
  - Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM
    directly instead of the removed _IS_ROCM_ENV global. Audio and
    vision on ROCm still need separate validation (FastVisionModel
    and the CSM audio codecs were never tested on HIP) so the guard
    stays for now.

Add _bnb_rocm_4bit_ok() as a runtime safety net for users who
install from this PR before the install.sh bnb pin kicks in, or
whose installer fell back to the PyPI pin because the continuous-
release wheel was unreachable. When the installed bnb is < 0.50 on
ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit /
-bnb-4bit suffix from the model path so a pre-quantized repo
resolves to its FP16 sibling instead of pulling bnb back in via
the repo's quantization_config. LoRA adapters whose base is a
pre-quantized repo on old bnb will still fail inside Unsloth's
loader -- the only real fix there is `unsloth studio update`.

Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1):

  - HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized
    repo): loads in 4-bit via the fixed GEMV, generation returns
    "Paris." for greedy and sampling.
  - SAFETY-NET path (simulated old bnb, suffix-stripped to the
    FP16 sibling, load_in_4bit=False): loads in bf16, generation
    returns "Paris." for greedy and sampling.

Net diff is ~45 lines smaller than the pre-revert state because
the entire plain-transformers 16-bit branch is gone.

* Cache _bnb_rocm_4bit_ok() with functools.cache

load_model() can be called many times in a single session but the bnb
version and hardware state cannot change at runtime, so memoise the
check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes`
inside the try block), subsequent calls drop to sub-microsecond dict
lookups. Zero behavioral change.

* Shorten verbose bnb/ROCm comments

Comment-only cleanup across install.sh, studio/install_python_stack.py,
and studio/backend/core/inference/inference.py. No behavioral change.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove _bnb_rocm_4bit_ok safety net from inference.py

Studio's ROCm support is brand new (PR #4720, merged today) and every
fresh install pulls the bnb continuous-release_main wheel via
install.sh / install_python_stack.py in this same PR. There are no
existing ROCm Studio installs carrying bnb < 0.50, so the defensive
version-check fallback is guarding against a scenario that cannot
actually occur. Delete the helper, the functools import, and the
safety-net block -- inference.py now calls FastLanguageModel.from_pretrained
directly with no ROCm branching.

* Drop audio/vision ROCm guard in inference.py — verified unblocked by bnb fix

Vision inference was blocked by the same bnb 4-bit GEMV bug that affected
text inference (vision models use bnb 4-bit for the LM backbone). With
bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works
end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit
loaded in 4-bit via FastVisionModel + for_inference returns a correct
answer to a multimodal prompt.

Audio (CSM) was never actually blocked by HIP — on this hardware CSM
loads and runs its backbone forward pass fine with bnb 0.50, then fails
during generate() with a transformers-level kwarg validation mismatch
in generation_csm.py (`backbone_last_hidden_state` rejected). That's a
pre-existing transformers/CSM integration bug that reproduces identically
on NVIDIA, so the ROCm-gated guard was never actually protecting users
from anything HIP-specific.

Remove the combined audio/vision guard and the now-unused _hw_module
import. Also restore the one-word "Can be" in an inline comment that
drifted during the earlier comment-shortening pass, so the inference.py
delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and
nothing else.

* Shorten max_seq_length=0 guard comment to one line

---------

Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-10 06:25:39 -07:00
Daniel Han
cad8c6ad05
Add AMD ROCm/HIP support across installer and hardware detection (#4720)
* Add ROCm detection to install.sh and expand shell tests

Add AMD ROCm GPU detection to get_torch_index_url() in install.sh.
When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm
version file, hipconfig, dpkg-query, and rpm.

Includes validation guard for malformed _rocm_tag, Debian epoch prefix
stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install,
and status messaging. Shell tests expanded to 23 cases.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add ROCm torch reinstall support to install_python_stack.py

Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a
Linux host has ROCm but the venv received CPU-only torch, and reinstall
with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a
30-second timeout on the torch GPU probe subprocess.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add ROCm support to llama.cpp prebuilt installer

Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm
via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream
prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP
prebuilt with CPU fallback). Add linux-rocm and windows-hip install
kinds to runtime_patterns_for_choice().

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add IS_ROCM hardware flag and fix AMD error message

Add IS_ROCM flag to hardware.py detect_hardware() (set when
torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM
from __init__.py. Add "rocm" key to get_package_versions().

Replace "We do not support AMD" error in tokenizer_utils.py with a
helpful message pointing to ROCm installation docs.

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

* Add comprehensive ROCm support test suite (68 tests)

Add tests/studio/install/test_rocm_support.py covering all ROCm code
paths across install_llama_prebuilt.py, install_python_stack.py,
hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks
and run without AMD hardware.

Covers: asset selection (11), runtime patterns (5), HostInfo (4),
ROCm version detection (9), torch reinstall (9), index mapping (8),
hardware flag (8), tokenizer message (2), install.sh structure (10),
and live regression (1).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden ROCm support: probe error handling, version cap, validation

Address review findings from 8 independent reviewers:

- Wrap _ensure_rocm_torch() torch probe in try/except for
  TimeoutExpired and OSError so a hung or broken torch import does not
  crash the installer (8/8 reviewers flagged this)
- Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to
  prevent installing unsupported torch 2.11.0 from the rocm7.1 index
- Use with-statement for file reads in _detect_rocm_version() to avoid
  resource leaks
- Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of
  default parameter to avoid relative path resolution)
- Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to
  reject rocm0.x tags that would produce nonexistent PyTorch index URLs
- Switch shell version cap from blocklist to allowlist (rocm6.*|rocm7.0*
  |rocm7.1* pass through, everything else caps to rocm7.1) so future
  ROCm 10+ does not fall through to a nonexistent index
- Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering
- Fix test_probe_timeout_handled: replace zero-assertion test with
  proper assertions verifying reinstall proceeds after timeout

* Clean up rocm_paths list construction in detect_host()

Filter None from the ROCM_PATH env var lookup at list construction time
instead of relying on the inline `if p` guard in the any() call.

* Require actual AMD GPU presence before selecting ROCm paths

All 8 reviewers across 2 cycles independently flagged that ROCm
detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core)
as a proxy for GPU presence, which would misroute CPU-only or NVIDIA
hosts that happen to have ROCm tools installed.

Now all 3 detection points (install.sh, install_python_stack.py,
install_llama_prebuilt.py) probe for an actual AMD GPU before
entering the ROCm path:

- install.sh: check rocminfo for gfx* GPU names, or amd-smi list
  for device rows, before version detection
- install_python_stack.py: new _has_rocm_gpu() function probes
  rocminfo and amd-smi list before _ensure_rocm_torch() proceeds
- install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi
  list instead of just checking tool existence or directory paths

Also:
- Shell test mock amd-smi now handles "list" subcommand
- Python tests updated to mock _has_rocm_gpu where needed
- Added test_no_gpu_with_rocm_tools_skips to verify the new guard
- Test index lookups now use sorted() to match production code

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden hipconfig version parsing and torch probe compatibility

- Add parts[1].isdigit() check in hipconfig version parsing to handle
  versions like "6.3-HIP" where the minor component has non-numeric
  suffix (strip "-" prefix before int() conversion)
- Use getattr() in torch probe subprocess to safely handle old or
  custom torch builds that may lack torch.version.hip/cuda attributes

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Strengthen AMD GPU detection and add NVIDIA precedence guard

- Change amd-smi list detection from any-non-empty-output to requiring
  "gpu" marker in output, matching the shell-side NR>1 check. Prevents
  false positives from header-only amd-smi list output.
- Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed
  AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and
  install_llama_prebuilt.py behavior).
- Apply the same amd-smi marker fix to install_llama_prebuilt.py
  detect_host() for consistency.

* Add Windows-specific ROCm/HIP detection in detect_host()

The previous detect_host() ROCm check used rocminfo and amd-smi list
which are Linux-only tools. On Windows, has_rocm would always be False,
making the Windows HIP prebuilt path at line 1794 unreachable.

Now detect_host() uses platform-specific detection:
- Linux: rocminfo (check for gfx GPU names) or amd-smi list
- Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH

This allows Windows AMD users to get the HIP prebuilt binary instead
of silently falling through to the CPU prebuilt.

* Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion

- worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check
  for hipcc before ROCm source builds, improve status messages and error
  reporting, add timeout and uv support for the source build fallback
- amd.py: New AMD GPU monitoring module via amd-smi metric --json,
  mirroring nvidia.py structure (utilization, temperature, power, VRAM)
- hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization,
  visible GPU queries, and physical GPU count
- install_python_stack.py: Detect AMD GPUs on Windows and warn that
  ROCm-enabled PyTorch must be installed manually
- kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032),
  RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries
- tests: Add 32 new tests covering all changes (95/95 pass)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage

- Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi
  output markers instead of just checking tool existence on PATH
- _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before
  giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools)
- amd.py _parse_numeric: handle dict-shaped metric objects from newer
  amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units
- amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly
  handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs
- amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index
  so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly
- install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels
  exist for older versions); fix rocm7.1* glob to not match rocm7.10+
- is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.)
- worker.py: increase ROCm source build timeout from 600s to 1800s;
  fix success log message for ROCm source builds
- Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation

- hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm
  before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with
  HIP-specific env vars report the correct visible device set
- amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi
  JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly;
  fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB
- install_python_stack.py: Windows AMD warning now validates actual GPU
  presence via hipinfo/amd-smi output markers before printing
- install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP
  detection after tool-based checks, so Windows HIP installs without CLI
  tools on PATH are still detected
- hardware.py: fix IS_ROCM comment to accurately describe its role

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec

Use explicit None checks instead of Python `or` operator when reading
HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string
("") is correctly honored as "no visible GPUs" rather than silently
falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems.

* Fix IS_ROCM test assertion for multi-line formatting

* Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count

- Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in
  both install.sh and install_python_stack.py to prevent resolver from
  selecting incompatible companion packages from ROCm wheel index
- Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence
  without hipinfo/amd-smi is not proof of GPU existence)
- Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which
  respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts

* Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428

The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3
(gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201)
architectures is based on the original work from PR #4428.

Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com>
Co-authored-by: billishyahao <bill.he@amd.com>

* Support AMD Radeon for studio (#4770)

Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>

* Remove ROCm test files from main PR

Move test_rocm_support.py and shell test additions to a separate PR
to keep the main ROCm support PR focused on implementation changes.

* Fix installer and hardware detection issues for PR #4720

- Fix empty _tri_arg passed to uv pip install in Radeon path (causes
  "Empty field is not allowed for PEP508" error)
- Fix Radeon fallback: use ROCm index instead of CPU-only when
  repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm)
- Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings
- Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64
  wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag)
- Fix IS_ROCM export: use __getattr__ so callers always see the live
  value after detect_hardware() runs
- Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES
  on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set
- Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB)
- Add amd-smi version as a fallback in _detect_rocm_version
- Fix trailing whitespace and missing newline at EOF in install.sh

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix GPU detection false positives and add missing health groups

- Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows
  from amd-smi list, not just header containing "gpu"
- Apply same fix in detect_host() in install_llama_prebuilt.py
- Add runtime_payload_health_groups for linux-rocm and windows-hip so
  partial/corrupt ROCm/HIP prebuilt installs are properly detected
- Add bitsandbytes install to Radeon fallback paths (was only in the
  success path, skipped when repo.radeon.com was unreachable)
- Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main)
  and only use __getattr__ for IS_ROCM

* Fix _ensure_rocm_torch and Windows AMD warning false positives

- _ensure_rocm_torch: only skip when HIP is already present, not for
  CUDA builds (which are unusable on AMD-only hosts). Fixes the case
  where a venv has a stale CUDA wheel and the repair step is skipped.
- Windows AMD warning: use GPU data row check (same as Linux fix) to
  avoid false positives from amd-smi list header-only output.

* Fix amd-smi GPU detection for GPU[N] output format

Older amd-smi versions output "GPU[0] : Card series: ..." instead of
"GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>"
formats to detect actual GPU data rows.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Harden AMD GPU detection against false positives

- install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with
  strict pattern matching GPU data rows (/^GPU[[:space:]]*[:\[]/)
- All files: reject rocminfo gfx000 (CPU HSA agent) by requiring
  gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe
- Fixes false positives on hosts with ROCm tools but no AMD GPU

* Remove duplicate comment from pre-commit merge

* Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports

- Extract _has_amd_rocm_gpu() shell function to avoid duplicating the
  rocminfo/amd-smi GPU detection logic in get_torch_index_url and
  the Radeon auto-detect block
- Consolidate bitsandbytes install into a single case block after torch
  install (was duplicated 4 times across Radeon success/fallback paths)
- Move math and re imports to top of amd.py (were inline in functions)
- Add _smi_query() helper in hardware.py to centralize IS_ROCM backend
  selection for get_gpu_utilization and get_visible_gpu_utilization

Addresses Gemini code review suggestions.

* Fix VRAM parsing for string values and GB/GiB consistency

- Extract unit from string-valued VRAM fields (e.g. "192 GiB") so
  _parse_memory_mb correctly applies the unit multiplier instead of
  treating the value as bare MB
- Treat GB and GiB identically (both as binary x1024) since GPU tools
  including amd-smi use binary units even when labeling them "GB"
- Fixes incorrect VRAM reporting on MI300-class cards (was showing
  ~0.19 GB instead of 192 GB for string-valued outputs)

* Add --no-cache to uv for ROCm HIP source builds

Avoid stale cache artifacts from partial HIP source builds when
uv is used for causal-conv1d/mamba-ssm compilation on ROCm.
The pip path already uses --no-cache-dir; this adds the uv equivalent
(--no-cache) only when is_hip is True.

* Fix critical: initialize _amd_gpu_radeon before case block

_amd_gpu_radeon was only set inside the */rocm*) case arm, so on
NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm",
the variable was unbound. With set -u (nounset) enabled, this crashes
the installer for every non-AMD user.

Move initialization to before the case block so it is always defined.

* Fix Windows AMD: route has_rocm hosts to HIP prebuilt path

resolve_release_asset_choice was selecting windows-cpu for all Windows
x86_64 hosts including those with has_rocm=True. Windows AMD users
should fall through to resolve_upstream_asset_choice which tries the
HIP prebuilt first. Add "not host.has_rocm" guard to the published
windows-cpu selection.

* Harden ROCm detection, Radeon wheel fallback, and HIP visibility

Addresses review findings from parallel reviewers on PR #4720:

- install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L
  to actually list a GPU before treating the host as NVIDIA. Fixes the
  stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the
  CUDA branch.
- install.sh: fix hipconfig awk blocks to propagate a non-zero exit code
  when the output is not a recognisable version string, so the ||-chain
  continues to dpkg-query / rpm instead of terminating early.
- install.sh: fail-closed on Radeon wheel fallback. When torch,
  torchvision or torchaudio is missing from the Radeon repo for the
  active Python tag, fall back to the standard ROCm index instead of
  silently mixing Radeon wheels with PyPI defaults. Quote all wheel
  arguments individually so wheel filenames cannot be word-split or
  glob-expanded.
- install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to
  list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts
  with a broken leftover nvidia-smi to the ROCm path instead of
  misclassifying them as NVIDIA.
- install_llama_prebuilt.py: scan upstream assets for any rocm-<version>
  prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+
  users pick up a matching upstream prebuilt when one exists.
- install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for
  linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted
  on the GPU path instead of passing validation on CPU only.
- install_llama_prebuilt.py: restore the published windows-cpu fallback
  for AMD Windows hosts without a HIP prebuilt so hash-approved bundles
  are still preferred over the raw upstream CPU asset.
- install_python_stack.py: drop the /opt/rocm / hipcc gate in
  _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm
  installs (package-managed minimal installs, Radeon software) that ship
  amd-smi / rocminfo without hipcc can now repair a CPU-only venv via
  "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard.
- studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES /
  ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in
  get_primary_gpu_utilization(). A process restricted to GPU 2 now
  reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain
  bytes unit detection to an explicit allowlist.
- studio/backend/utils/hardware/hardware.py: route
  get_backend_visible_gpu_info()'s backend_cuda_visible_devices field
  through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the
  unconditional "(rocm=False)" suffix in apply_gpu_ids() logs.

* Fix round 2 regressions: ROCm validate_server and Windows HIP routing

Follow-up to 810b833b addressing review findings on the first round of
hardening commits:

- install_llama_prebuilt.py validate_server: gate --n-gpu-layers on the
  resolved install_kind instead of host.has_rocm. AMD Windows hosts
  without a HIP prebuilt fall back to windows-cpu and must not be
  validated with GPU layers; thread install_kind through from the
  caller.
- install_llama_prebuilt.py resolve_release_asset_choice: reinstate the
  "not has_rocm" guard on the published windows-cpu bundle so AMD
  Windows hosts reach resolve_upstream_asset_choice() where the new
  HIP prebuilt path lives. Prefer a published windows-hip bundle first
  when one exists, fall through to upstream HIP + upstream CPU
  otherwise.
- install_llama_prebuilt.py detect_host: also set has_physical_nvidia
  when the secondary --query-gpu block confirms a working NVIDIA GPU,
  so older nvidia-smi versions without -L support do not silently skip
  the Linux diagnostics that key off has_physical_nvidia.
- install_llama_prebuilt.py: drop redundant "import re as _re" /
  "import re as _re_rocm" local aliases in favour of the existing
  top-level "import re".
- install_python_stack.py _ensure_rocm_torch: run the AMD
  bitsandbytes install unconditionally after the HIP-torch probe so
  "unsloth studio update" on venvs that already have ROCm torch still
  gains the AMD bitsandbytes build.
- install.sh: add a non-x86_64 early-exit to get_torch_index_url() so
  aarch64 / arm64 Linux hosts do not hit the ROCm wheel index
  (PyTorch only publishes ROCm wheels for linux_x86_64).
- install.sh: add bitsandbytes install to the migrated-environment
  branch so upgrades pick it up for ROCm hosts instead of only the
  fresh-install path.
- install.sh: in the Radeon wheel path, pass version constraints +
  --no-index --find-links to uv instead of explicit wheel URLs so a
  version-compatible torch / torchvision / torchaudio triple is
  resolved, rather than picking the highest-version wheel for each
  package independently.
- studio/backend/utils/hardware/amd.py _first_visible_amd_gpu_id: fall
  through to lower-priority visibility env vars when the first entry
  is malformed (leading comma, all-whitespace first token) instead of
  silently returning GPU 0.

* Fix round 3 findings: x86_64 guard, ROCm version clip, Radeon deps

Address issues surfaced by the round 3 reviewers on top of 8636fa63:

- install_python_stack.py _ensure_rocm_torch: add the same `x86_64`
  guard that install.sh already has. Linux aarch64 / arm64 ROCm hosts
  must skip the repair path entirely; PyTorch only publishes ROCm
  wheels for linux_x86_64, and without this guard
  `unsloth studio update` aborts with a missing-wheel error on non
  x86_64 hosts.
- install_llama_prebuilt.py resolve_upstream_asset_choice: add a
  best-effort _detect_host_rocm_version() helper (reading
  /opt/rocm/.info/version, amd-smi version, hipconfig --version) and
  filter rocm_candidates to entries whose major.minor is <= host
  version. Falls back to the newest candidate only when no compatible
  one exists, so a ROCm 6.4 host downloads rocm-6.4 instead of being
  handed the numerically newest rocm-7.2 bundle (which fails preflight
  and forces a source build).
- install.sh: remove the round 2 --no-index switch from the Radeon
  wheel branch. --no-index forced uv to ignore PyPI entirely, which
  broke transitive dependency resolution (filelock, sympy, networkx,
  jinja2, fsspec, setuptools, typing-extensions, ...) on a fresh venv.
  Restore the round 1 explicit wheel URL invocation but add a
  torch / torchvision / torchaudio version-pair sanity check so a
  mismatched trio (e.g. torch 2.9.1 + torchvision 0.23.0 + torchaudio
  2.9.0) falls back to the standard ROCm index instead of installing a
  broken combination.
- install_python_stack.py _ensure_rocm_torch: restructure the
  "tag is None" path so it no longer short-circuits the bitsandbytes
  install. On a ROCm runtime older than anything in
  _ROCM_TORCH_INDEX, print the "no wheel" warning but still run the
  AMD bitsandbytes install.
- studio/backend/core/training/worker.py: restore the pre-PR
  "no timeout" behaviour for non-HIP causal-conv1d / mamba-ssm source
  builds. The round 2 "timeout = 1800 if is_hip else 300" cap aborts
  slow non-HIP builds (Linux aarch64, unsupported torch/CUDA combos)
  after 5 minutes; omit timeout for the non-HIP branch so the cap
  only applies to ROCm source builds.

* Fix round 4 findings: apply_gpu_ids env inheritance, Radeon X.Y, bitsandbytes gate

Address remaining issues surfaced by the round 4 reviewers:

- studio/backend/utils/hardware/hardware.py apply_gpu_ids: mirror the
  selection into HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES whenever
  the caller already had a ROCm visibility env var set, not only when
  IS_ROCM has already been set by detect_hardware(). Training and
  inference workers call apply_gpu_ids() before detect_hardware()
  runs, so the old guard would leave a forked ROCm worker with a
  stale HIP_VISIBLE_DEVICES mask that no longer matched the
  narrowed CUDA_VISIBLE_DEVICES selection.
- install.sh get_radeon_wheel_url: accept X.Y ROCm versions in
  addition to X.Y.Z. The `/opt/rocm/.info/version` file and some
  hipconfig versions report only two components, and the Radeon
  repository publishes both rocm-rel-X.Y.Z/ and rocm-rel-X.Y/
  directories, so treating X.Y as invalid caused Radeon hosts to fall
  back to the generic ROCm index even when a matching AMD wheel set
  existed.
- install_python_stack.py _ensure_rocm_torch: only install the AMD
  bitsandbytes build when the venv actually has a ROCm-compatible
  torch (either already present or just installed by this function).
  Previously the bitsandbytes install ran unconditionally, which
  could leave an AMD bitsandbytes layered on top of a CPU/CUDA torch
  on hosts where the ROCm runtime is older than any entry in
  _ROCM_TORCH_INDEX. Also add --force-reinstall so an existing
  CPU/CUDA bitsandbytes is replaced by the AMD build during upgrades.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix gemini findings: amd-smi metric envelope validation and dict-wrapped GPU id

Two medium-severity defensive fixes from the gemini-code-assist review on
the AMD monitoring backend:

1. _extract_gpu_metrics may return a dict where every value is None when
   amd-smi succeeds (zero exit) but the JSON envelope contains no usable
   fields (error response, unsupported card). The new _has_real_metrics
   helper lets get_primary_gpu_utilization surface available:False and
   lets get_visible_gpu_utilization skip ghost device rows so the UI
   does not render placeholder cards with empty numbers.

2. Newer amd-smi versions wrap scalar fields as {"value": 0, "unit":
   "none"}, including the per-GPU id. The previous int(raw_id) call
   silently fell back to the enumeration index in that case, losing the
   real GPU id. Routing raw_id through the existing _parse_numeric
   helper handles bare ints, floats, strings, and the dict shape
   uniformly, with a debug log on parse failure.

* Fix gemini round 2 findings: explicit length guard on ROCm version file parser

Both _detect_rocm_version (install_python_stack.py) and
_detect_host_rocm_version (install_llama_prebuilt.py) read /opt/rocm/.info/version
or $ROCM_PATH/lib/rocm_version, split on "." and unconditionally accessed
parts[1]. The surrounding broad `except Exception: pass` already swallowed
the resulting IndexError, so a one-component file like "6\n" did fall
through to the next detection source -- but the control flow relied on
exception handling instead of an explicit check.

Add `if len(parts) >= 2:` guards in both helpers so the loop falls through
on its own without raising. Behaviour is unchanged for the common multi-
component case; the previously-silent IndexError path becomes an explicit
no-op.

* Fix gemini round 3: include has_rocm in validate_server fallback path

When validate_server is called without an explicit install_kind (older
call sites that have not been updated), the fallback was only enabling
--n-gpu-layers for NVIDIA and macOS arm64 hosts. AMD ROCm Linux hosts
fell through to the CPU validation path even though the prebuilt being
exercised was a HIP binary.

Add host.has_rocm to the fallback expression so the GPU offload flag is
applied consistently with the install_kind=='linux-rocm' / 'windows-hip'
branches above.

* Fix gemini round 4: remove risky bytes-vs-MB heuristic in _parse_memory_mb

The previous heuristic divided any bare number above 10_000_000 by
1024*1024 on the assumption that large unit-less values were bytes.
This misclassified small VRAM allocations: 5 MB of used VRAM reported
as 5_242_880 bytes without a unit would be taken at face value and
render as 5_242_880 MB (~5 TB) in the monitoring UI.

Modern amd-smi always provides explicit units (MiB/GiB dict form),
and legacy amd-smi returns bare numbers in MB -- the heuristic never
had a real workload to handle. Drop it and default to MB for bare
numeric input, keeping the existing unit-aware branches for dict /
string inputs unchanged.

The unrelated gemini suggestion to "default minor to 0" in the
amd-smi version awk parser was intentionally NOT applied: rocm7.0
and rocm7.1 ship different wheel sets, so silently substituting 0
for a missing minor could install the wrong wheels. The existing
reject-and-fall-through behaviour is safer.

* Fix gemini round 5: POSIX compliance and leading-comma visibility parsing

Three medium findings from gemini-code-assist addressed in this commit:

1. _pick_radeon_wheel used grep -o and sort -V, both GNU extensions
   that are not in POSIX and break on BSD/BusyBox coreutils. install.sh
   has a #!/bin/sh shebang so the whole pipeline was rewritten as a
   single awk script that extracts all href="..." hits on each line,
   filters to wheels matching the package prefix and python tag, and
   picks the newest version via zero-padded lexical comparison. No
   external sort or grep is needed.

2. _first_visible_amd_gpu_id in the AMD monitoring backend treated a
   leading comma (e.g. HIP_VISIBLE_DEVICES=",1") as "fall through to
   the next env var", which is surprising given the clear intent to
   narrow to device 1. Filter empty tokens after the split and return
   the first real one. An all-commas value ("," / ",,,") still falls
   through because no real tokens exist; the empty-string and "-1"
   explicit-zero cases are unchanged.

The unrelated amd-smi version awk parser suggestion was not applied
(see round 4 commit message for rationale: defaulting a missing minor
to 0 could silently install the wrong ROCm wheel set).

* Fix 20-reviewer.py findings: base drift, Radeon %2B, dpkg/rpm fallback, bnb, backend label

Consolidated fix batch from a 20-parallel reviewer.py run on the current
head. Each fix is drawn from a high-consensus finding and addresses a
real bug or feature gap, not a stylistic preference.

1. install.sh: bump `unsloth>=2026.4.2` -> `unsloth>=2026.4.4` at five
   call sites so this branch no longer regresses main's version floor
   (main bumped to 2026.4.4 in #4876). Without this, merging 4720 would
   silently downgrade the minimum version pin for fresh installs.

2. install.sh: URL-decode Radeon wheel names before extracting the
   torch / torchvision / torchaudio version strings. Real wheel URLs
   from repo.radeon.com are percent-encoded ("torch-2.10.0%2Brocm7.2.0...")
   so the previous `[+-]` terminator in the sed regex never matched,
   `_torch_ver` stayed empty, `_radeon_versions_match` stayed false,
   and every Radeon consumer install silently fell back to the generic
   ROCm index. Now decode %2B -> + first, then extract, then validate.

3. install.sh: the two AMD bitsandbytes install lines were running
   `uv pip install "bitsandbytes>=0.49.1"` without `--force-reinstall`,
   so upgrades where the venv already has a CPU/CUDA bitsandbytes
   satisfying the constraint would keep the stale non-AMD wheel. Add
   `--force-reinstall --no-cache-dir` to both call sites, matching the
   pattern already used in install_python_stack.py::_ensure_rocm_torch.

4. install_python_stack.py and install_llama_prebuilt.py: add
   `dpkg-query -W rocm-core` and `rpm -q rocm-core` fallbacks to the
   Python-side ROCm version detectors so they match the chain in
   install.sh::get_torch_index_url. Package-managed ROCm installs
   (Debian/Ubuntu/RHEL/Fedora distro packages) can expose GPUs via
   rocminfo/amd-smi but still lack /opt/rocm/.info/version, hipconfig,
   or amd-smi `version` output -- without these fallbacks, `unsloth
   studio update` on such hosts returned None and skipped the ROCm
   torch repair. Also strip the dpkg epoch prefix ("1:6.3.0-1") before
   parsing so epoch-annotated packages parse correctly.

5. hardware.py: add a `_backend_label(device)` helper that returns
   "rocm" when IS_ROCM is set and the device is DeviceType.CUDA, and
   use it for every `"backend": ...` emission in JSON responses served
   to the Studio frontend. Internally we still represent ROCm hosts as
   DeviceType.CUDA (ROCm torch reuses the whole torch.cuda.* API
   surface), but the user-facing API now correctly reports "rocm" on
   AMD boxes instead of labeling them as "cuda".

All 250 simulation scenarios pass (was 233 before this batch: added 17
new regression tests covering the version pin, %2B decoding, bnb
force-reinstall flags, dpkg/rpm fallback presence, and the
_backend_label helper's four-way truth table).

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix gemini round 6 + URL audit: amd.py defensive checks, rocm6.5+ clip to 6.4

Two rounds of fixes in one commit, plus a full URL audit of every PyPI /
download.pytorch.org / repo.radeon.com reference the PR introduces.

amd.py (4 medium gemini findings on commit b3627bc2):

1. _extract_gpu_metrics used `and vram_total_mb` as part of the vram_util
   gate. The follow-up `vram_total_mb > 0` already handles the division
   guard, but the truthiness check was redundant and slightly surprising
   for a 0.0 valid value. Replace with explicit `is not None and > 0`
   for both vram_util and power_util.

2. get_physical_gpu_count called `data.get("gpu", ...)` without guarding
   for non-dict envelopes. A scalar / string JSON response from amd-smi
   would raise AttributeError. Add an isinstance(data, dict) check and
   return None for unexpected shapes.

3. get_visible_gpu_utilization had the same .get() exposure on the outer
   envelope. Rewrite the gpu_list extraction as an explicit
   list/dict/else cascade so a malformed scalar envelope produces
   gpu_list=[data] and continues without raising.

4. The same function's per-entry loop also called gpu_data.get() on
   whatever was inside gpu_list. If a scalar ever leaks into the list
   (directly or via the previous fix's fallback), _extract_gpu_metrics
   would raise on the first .get() inside the helper. Skip non-dict
   entries in the loop before extracting metrics.

install.sh (URL audit finding, previously flagged by 20-reviewer as #13):

5. get_torch_index_url used `rocm6.*` in the rocm tag case statement,
   which matched rocm6.5 and rocm6.6 and emitted
   download.pytorch.org/whl/rocm6.5 -- which returns HTTP 403 because
   PyTorch only publishes rocm 5.7, 6.0-6.4, 7.0-7.2. Enumerate the
   supported 6.x minors explicitly and add a rocm6.* fallback branch
   that clips to rocm6.4 (the last supported 6.x wheel set).

URL audit results (all URLs PR 4720 references):
- 14/14 download.pytorch.org/whl/{cpu,cu118,cu124,cu126,cu128,cu130,
  rocm6.0..6.4,rocm7.0..7.2} return HTTP 200.
- 9/9 repo.radeon.com/rocm/manylinux/rocm-rel-{5.7,6.0,6.1,6.2,6.3,
  6.4,7.0,7.1,7.2}/ return HTTP 200.
- X.Y.Z patch directories exist for 7.0.2, 7.1.1, 7.2.1 but NOT for
  6.3.0, 6.4.0, 6.2.1 -- install.sh already handles this via the X.Y.Z
  -> X.Y fallback sed in the Radeon wheel install block.
- Docs links (rocm.docs.amd.com, docs.unsloth.ai AMD guide) and the
  llama.cpp GitHub releases API endpoint all return 200.

Test suite: 255 -> 258. New regression coverage:
- U17: get_physical_gpu_count tolerates scalar amd-smi envelope
- U18: get_visible_gpu_utilization tolerates scalar envelope
- U19a-c: vram_util / power_util return None on zero total, but
  vram_total_gb still echoes 0.0 (not None)
- A_rocm{6.5,6.6,6.9}_clips_to_rocm64: install.sh clips unsupported
  6.x minors to rocm6.4 instead of producing a 403 index URL

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix reviewer.py round 2: tokenizer AMD multi-GPU, --no-torch bnb, main.py backend label

Three high-confidence findings from a second 20-parallel reviewer.py run
on commit 7effb3ae. Triaged 15 total findings and applied the three that
were confirmed as real bugs; the rest were either false positives (e.g.
"migrated AMD venv not repaired" -- _ensure_rocm_torch runs downstream
via setup.sh regardless), design decisions (e.g. visibility mask env
vars not consulted in installer detection), or edge cases the existing
fallback logic already handles.

1. unsloth/tokenizer_utils.py [6/20]: the multi-GPU guard's shell probe
   runs `nvidia-smi --query-gpu=memory.used`, catches the failure, then
   only raises if `torch.cuda.is_available()` is False. On ROCm torch,
   torch.cuda.is_available() returns True (ROCm reuses the torch.cuda.*
   API), so the guard becomes dead code on AMD hosts and multi-GPU AMD
   setups slip through even though unsloth does not support them yet.
   Add a torch.cuda.device_count() > 1 fallback inside the except so
   AMD multi-visible-device setups are flagged consistently with the
   original CUDA memory check.

2. install.sh [1/20]: the fresh-install bitsandbytes block for AMD ROCm
   ran unconditionally when TORCH_INDEX_URL matched `*/rocm*`, even when
   SKIP_TORCH=true (from --no-torch or Intel Mac auto-detect). A user
   running `install.sh --no-torch` on an AMD host would still pull in
   bitsandbytes despite explicitly asking for GGUF-only mode. Wrap the
   case block in an outer `[ "$SKIP_TORCH" = false ]` guard.

3. studio/backend/main.py [3/20]: the /api/system endpoint returned
   `"device_backend": get_device().value`, which is "cuda" on ROCm
   hosts (because ROCm torch piggybacks on torch.cuda). Other endpoints
   (hardware.py) already use the _backend_label helper which swaps
   "cuda" -> "rocm" when IS_ROCM. Route /api/system through the same
   helper so the Studio UI reports the backend consistently across all
   endpoints.

4. studio/backend/tests/test_utils.py: update test_backend_matches_device
   to call _backend_label(get_device()) instead of raw get_device().value
   so the test matches the new contract and still passes on CUDA hosts.

Tests: 258 -> 261. New regression coverage:
- X08 main.py /api/system uses _backend_label
- X09 tokenizer multi-GPU guard has device_count() fallback
- X10 fresh-install bnb case block gated on SKIP_TORCH=false

* fix: prevent bitsandbytes from overwriting ROCm torch with CUDA wheels

During install, bitsandbytes was installed without --no-deps, causing
uv to resolve torch from PyPI (CUDA build) and silently overwrite the
ROCm wheels that were just installed in the previous step.

This happened in three places:
- install.sh: bitsandbytes install in both migrated and fresh paths
- install_python_stack.py: bitsandbytes install inside _ensure_rocm_torch()

Additionally, multiple install steps in install_python_stack.py (extras,
overrides, studio deps) can pull in CUDA torch via transitive
dependencies. A final _ensure_rocm_torch() call at the end of the
install sequence ensures ROCm torch is always in place at runtime.

All changes are gated behind ROCm-specific conditions and do not affect
NVIDIA, CPU-only, macOS, or Windows install paths.

Tested on AMD Instinct MI300X VF with ROCm 7.2.0 -- confirms
torch==2.10.0+rocm7.1 with HIP 7.1.25424 after install.

* fix: ROCm inference fallback -- skip Unsloth patching and bnb 4-bit on HIP

On AMD ROCm (HIP), two issues prevent the normal Unsloth inference path:

1. Unsloth's global monkey-patching of transformers model classes
   (LlamaRotaryEmbedding, attention modules) triggers
   _assert_async_cuda_kernel crashes on HIP during generation.
   Training uses different code paths and works fine.

2. bitsandbytes 4-bit matmul kernels also trigger HIP assertion
   failures on MI300X (CDNA3 / gfx942), even without Unsloth patching.

This commit adds a ROCm-specific inference fallback that:
- Skips importing Unsloth at module level (prevents global patching)
- Loads models in 16-bit with plain transformers + PEFT instead
- Resolves pre-quantized model names (e.g. "xxx-bnb-4bit" -> "xxx")
  since pre-quantized HF repos still trigger bnb codepaths
- Guards get_chat_template calls (unavailable without Unsloth import)
- Fixes max_seq_length=0 being passed to from_pretrained (GGUF
  semantics don't apply to transformers path)

The NVIDIA path is completely unchanged -- Unsloth import and
for_inference() optimization remain active. GGUF inference (via
llama-server/HIP) is unaffected since it never imports Python model
classes. AMD GPUs typically have large VRAM (e.g. 192GB on MI300X)
so 16-bit loading is practical for inference.

Tested on AMD Instinct MI300X VF (ROCm 7.2, HIP 7.1.25424):
- Simple generation: PASS
- Compare mode (base vs finetuned): PASS
- GGUF inference + tool calling: PASS (unaffected by this change)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: guard audio/vision inference on ROCm, remove unused import

- Add clear RuntimeError for audio/vision model inference on ROCm
  (these paths use Unsloth's FastModel/FastVisionModel which would
  crash on HIP; GGUF inference is the supported path on AMD)
- Remove unused `import os as _os` from the ROCm changes

* fix: amd-smi parsing for newer output format (gpu_data wrapper, mem_usage, temperature)

amd-smi on recent ROCm versions (7.x) wraps metric output in a
{"gpu_data": [...]} envelope instead of returning a raw list. This
caused get_primary_gpu_utilization() and get_visible_gpu_utilization()
to fail silently (returning available=False) because the GPU data
dict was never unwrapped.

Additionally:
- VRAM data moved from "vram" to "mem_usage" with "total_vram" /
  "used_vram" keys. Added fallback key lookup.
- Temperature "edge" sensor returns "N/A" on MI300X VF; the previous
  dict.get() chain returned the "N/A" string instead of falling
  through to "hotspot". Changed to a loop that checks each key until
  a parseable value is found.

Tested on AMD Instinct MI300X VF (ROCm 7.2, amd-smi 24.x):
- GPU utilization: 0% (idle), up to 100% during training
- Temperature: 40-44C (from hotspot sensor)
- VRAM: 0.28/191.69 GB (idle)
- Power: 158-211W draw

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Bug fix detecting radeon (#4940)

* Bug fix detecting radeon

* Expanding GPU target for gfx1100*

* Generalize gfx family-prefix filter to cover gfx10/gfx12 as well

rocminfo on ROCm 6.1+ emits LLVM generic-family ISA lines alongside the
specific GPU (e.g. gfx11-generic next to gfx1100). The outer grep captures
the bare family prefix from the generic line, and passing that to
-DGPU_TARGETS breaks the HIP build because clang only accepts specific
gfxNNN ids.

The previous filter only special-cased gfx11. Generalize it so any bare
2-digit family prefix (gfx10, gfx11, gfx12, ...) is dropped whenever a
specific sibling target is present in the same list. No real AMD GPU has
a 2-digit gfx id, so the filter can only ever drop family prefixes and
never a real target.

Covers the existing gfx11 cases unchanged, and extends the same fix to
gfx10-1-generic / gfx10-3-generic (RDNA1/2) and gfx12-generic (RDNA4),
which would otherwise hit the same build failure on newer rocminfo.

---------

Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>

---------

Co-authored-by: Eda Z <eda.zhou@amd.com>
Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: billishyahao <bill.he@amd.com>
Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com>
Co-authored-by: Iswarya Alex <iswarya.alex@amd.com>
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
2026-04-10 01:56:12 -07:00
Roland Tannous
33503ea248
Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945)
This reverts commit bcf4fd6bd3.
2026-04-09 23:14:57 -07:00
Roland Tannous
bcf4fd6bd3
updated models template mappers. added lfm2.5vl450m to transformers 5… (#4939)
* updated models template mappers. added lfm2.5vl450m to transformers 5.3.0 whitelist

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 23:36:42 +04:00
Ricardo-M-L
d5525e8bbb
fix: check find() return value before adding offset in try_fix_tokenizer (#4923)
* fix: check find() return value before adding offset in try_fix_tokenizer

The `str.find()` result was checked for -1 only after adding
`len(find_text)`, turning the guard into dead code. When the substring
is absent, `start` becomes `len(find_text) - 1` (a positive number),
so the `if start == -1: continue` never triggers and the subsequent
slice extracts garbage from the tokenizer string.

Split the find and offset into two steps so the -1 check works correctly.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add defensive guards for token_id None and end find() returning -1

- Skip loop iteration early when token_id is None to avoid constructing
  a find_text that can never match valid JSON
- Guard end = tokenizer_string.find('",', start) against -1 to prevent
  silent garbage extraction from malformed tokenizer strings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 06:15:46 -07:00
Lee Jackson
dc16e0c65b
Studio: keep chat input visible and fix compare pane clipping (#4924)
* fix(chat): sticky composer bar in thread

* fix(chat): fix compare pane clipping

* fix(chat): tighten scroll-to-bottom placement and compare footer spacing

* Fix TypeScript build break and clean up ViewportFooter classes

- Remove unused `compact` prop from ThreadScrollToBottom call site
  (component is FC with no props, passing it caused TS2322)
- Extract shared classes (sticky, bottom-0, z-20, bg-transparent) from
  ternary branches into the unconditional className string
- Restore `relative` on normal-mode footer so the inner absolute
  bg-background strip has a positioning context
- Remove redundant md:pb-3 / md:pb-4 (same value as base pb-3 / pb-4)
- Remove no-op `sticky bottom-0` from SharedComposer wrapper in both
  LoraCompareContent and GeneralCompareContent (flex layout with
  shrink-0 already pins it at the bottom; parent has no scrollable
  overflow for sticky to bind to)
- Fix truncated comment on pointer-events rationale

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-09 06:00:56 -07:00
kiankyars
ad5972492d
Fix raw text paragraph break normalization (#4884)
* Fix raw text paragraph break normalization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Normalize horizontal whitespace before stripping non-ASCII and collapse leftover doubles

Run the [^\S\n]+ horizontal-whitespace collapse before the non-ASCII strip
so that Unicode whitespace (\u00A0, \u202F, \u2009, \u3000, \v, \f, etc.)
becomes a single ASCII space instead of being deleted outright. The prior
ordering silently merged adjacent words on HTML/PDF/OCR-sourced text:
"hello\u00a0world" used to produce "helloworld" after this PR; it now
produces "hello world".

Also drop \t from the allow-list since the horizontal-whitespace collapse
already normalizes tabs to a single space, and add a targeted [ ]{2,} pass
right after the non-ASCII strip so that a non-whitespace non-ASCII character
sitting between two spaces ("word1 (c) word2") does not leave an interior
double space. Without this extra pass, clean_text was not idempotent on
such inputs: the first call produced "word1  word2" and only the second
call collapsed it to "word1 word2". Fuzz testing over 10000 random inputs
now satisfies the idempotence invariant in every case.

* Add regression tests for Unicode/control whitespace and non-ASCII edge cases

Cover:
- Unicode horizontal whitespace separators (NBSP, narrow NBSP, thin space,
  en/em space, ideographic space, vertical tab, form feed) normalizing to
  a single ASCII space instead of being deleted.
- Mixed paragraph + Unicode whitespace realistic input ("Section\u00a01\r\n\r\nBody\ftext\u202Fhere").
- Tab collapsing and space trimming around newlines.
- Non-whitespace non-ASCII characters (copyright, accented letters, emoji)
  sitting between spaces: must not leave an interior double space, and
  clean_text must be idempotent on these inputs.
- Non-ASCII characters adjacent to a newline: stripping must not leave
  stray leading or trailing spaces on the neighbouring line, and must not
  swallow an adjacent paragraph break.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-09 04:45:43 -07:00
cheehook
7aa442289b
Fix Mistral DPO/preference training crash on non-xformers platforms (e.g. Intel XPU) (#4889)
* Fix Mistral training crash when xformers is unavailable

* Fix/adjust Mistral DPO training crash fix for PR #4889

- Clarify comment in MistralForCausalLM_fast_forward: the DPO embed-masking
  block runs BEFORE attention_mask is nulled out, and it is the consumer that
  requires a 2D mask.
- Add defensive attention_mask.ndim == 2 guard to the LlamaModel_fast_forward
  DPO embed-masking block so it self-protects if a 4D mask ever reaches it.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-09 04:38:44 -07:00
Daniel Han
da2ef6dce6
Only run ldconfig CUDA-linking recovery when we have permission (#4930)
* Only run ldconfig CUDA-linking recovery when we have permission

When `import unsloth` runs on a non-root environment (shared HPC,
locked-down container, CI runner, etc.) the CUDA-linking recovery path
shells out to `os.system("ldconfig /usr/lib64-nvidia")`, which fails
loudly with "Permission denied". It's especially noisy for users who
don't even have bitsandbytes installed - they're doing 16bit or full
finetuning and the line immediately above told them "16bit and full
finetuning works!". The reason the recovery runs at all in that case
is that `bnb.functional.lib.cdequantize_blockwise_fp32` raises
AttributeError on `bnb is None`, the bare `except:` swallows it, and
the code drops into the recovery unconditionally.

Fix: gate the recovery body on `os.geteuid() == 0`. When we don't
have permission to run ldconfig, silently skip the recovery. When we
do, the recovery runs UNCHANGED - same `os.system()` calls, same
reload + retry, same warnings. `libcuda_dirs()` is used by both triton
and bitsandbytes, so we still want to run the recovery whenever we
have permission, regardless of whether bnb is installed.

For non-root users who DO have bitsandbytes installed and broken,
emit a single remediation warning telling them how to fix it manually
(`sudo ldconfig /usr/lib64-nvidia`). This preserves the diagnostic
guidance from the original code without the Permission denied noise.

Scope:
- Only the `DEVICE_TYPE == "cuda"` branch is touched.
- The `hip` (AMD ROCm) and `xpu` (Intel) branches are unchanged.
- On a real CUDA box running as root, behavior is byte-identical to
  main: same os.system() calls, same reload, same retry, same warnings.
  AST-verified by /tmp/verify_minimal/verify.py.
- `hasattr(os, "geteuid")` guards against Windows where `os.geteuid`
  doesn't exist.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-09 00:07:25 -07:00
dependabot[bot]
5fa8683b27
build(deps): bump the bun-frontend group across 1 directory with 16 updates (#4586)
* build(deps): bump the bun-frontend group across 1 directory with 16 updates

Bumps the bun-frontend group with 16 updates in the /studio/frontend directory:

| Package | From | To |
| --- | --- | --- |
| [@dagrejs/dagre](https://github.com/dagrejs/dagre) | `2.0.4` | `3.0.0` |
| [@dagrejs/graphlib](https://github.com/dagrejs/graphlib) | `3.0.4` | `4.0.1` |
| @hugeicons/core-free-icons | `3.3.0` | `4.0.0` |
| [@streamdown/cjk](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-cjk) | `1.0.2` | `1.0.3` |
| [@streamdown/code](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-code) | `1.0.2` | `1.1.1` |
| [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) | `0.577.0` | `1.6.0` |
| [recharts](https://github.com/recharts/recharts) | `3.7.0` | `3.8.0` |
| [shadcn](https://github.com/shadcn-ui/ui/tree/HEAD/packages/shadcn) | `3.8.5` | `4.1.0` |
| [streamdown](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown) | `2.3.0` | `2.5.0` |
| [@biomejs/biome](https://github.com/biomejs/biome/tree/HEAD/packages/@biomejs/biome) | `1.9.4` | `2.4.8` |
| [@eslint/js](https://github.com/eslint/eslint/tree/HEAD/packages/js) | `9.39.4` | `10.0.1` |
| [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) | `24.12.0` | `25.5.0` |
| [eslint](https://github.com/eslint/eslint) | `9.39.4` | `10.1.0` |
| [eslint-plugin-react-refresh](https://github.com/ArnaudBarre/eslint-plugin-react-refresh) | `0.4.26` | `0.5.2` |
| [globals](https://github.com/sindresorhus/globals) | `16.5.0` | `17.4.0` |
| [typescript](https://github.com/microsoft/TypeScript) | `5.9.3` | `6.0.2` |



Updates `@dagrejs/dagre` from 2.0.4 to 3.0.0
- [Release notes](https://github.com/dagrejs/dagre/releases)
- [Changelog](https://github.com/dagrejs/dagre/blob/master/changelog.md)
- [Commits](https://github.com/dagrejs/dagre/compare/v2.0.4...v3.0.0)

Updates `@dagrejs/graphlib` from 3.0.4 to 4.0.1
- [Release notes](https://github.com/dagrejs/graphlib/releases)
- [Changelog](https://github.com/dagrejs/graphlib/blob/master/changelog.md)
- [Commits](https://github.com/dagrejs/graphlib/compare/v3.0.4...v4.0.1)

Updates `@hugeicons/core-free-icons` from 3.3.0 to 4.0.0

Updates `@streamdown/cjk` from 1.0.2 to 1.0.3
- [Release notes](https://github.com/vercel/streamdown/releases)
- [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-cjk/CHANGELOG.md)
- [Commits](https://github.com/vercel/streamdown/commits/@streamdown/cjk@1.0.3/packages/streamdown-cjk)

Updates `@streamdown/code` from 1.0.2 to 1.1.1
- [Release notes](https://github.com/vercel/streamdown/releases)
- [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-code/CHANGELOG.md)
- [Commits](https://github.com/vercel/streamdown/commits/@streamdown/code@1.1.1/packages/streamdown-code)

Updates `lucide-react` from 0.577.0 to 1.6.0
- [Release notes](https://github.com/lucide-icons/lucide/releases)
- [Commits](https://github.com/lucide-icons/lucide/commits/1.6.0/packages/lucide-react)

Updates `recharts` from 3.7.0 to 3.8.0
- [Release notes](https://github.com/recharts/recharts/releases)
- [Changelog](https://github.com/recharts/recharts/blob/main/CHANGELOG.md)
- [Commits](https://github.com/recharts/recharts/compare/v3.7.0...v3.8.0)

Updates `shadcn` from 3.8.5 to 4.1.0
- [Release notes](https://github.com/shadcn-ui/ui/releases)
- [Changelog](https://github.com/shadcn-ui/ui/blob/main/packages/shadcn/CHANGELOG.md)
- [Commits](https://github.com/shadcn-ui/ui/commits/shadcn@4.1.0/packages/shadcn)

Updates `streamdown` from 2.3.0 to 2.5.0
- [Release notes](https://github.com/vercel/streamdown/releases)
- [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown/CHANGELOG.md)
- [Commits](https://github.com/vercel/streamdown/commits/streamdown@2.5.0/packages/streamdown)

Updates `@biomejs/biome` from 1.9.4 to 2.4.8
- [Release notes](https://github.com/biomejs/biome/releases)
- [Changelog](https://github.com/biomejs/biome/blob/main/packages/@biomejs/biome/CHANGELOG.md)
- [Commits](https://github.com/biomejs/biome/commits/@biomejs/biome@2.4.8/packages/@biomejs/biome)

Updates `@eslint/js` from 9.39.4 to 10.0.1
- [Release notes](https://github.com/eslint/eslint/releases)
- [Commits](https://github.com/eslint/eslint/commits/v10.0.1/packages/js)

Updates `@types/node` from 24.12.0 to 25.5.0
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node)

Updates `eslint` from 9.39.4 to 10.1.0
- [Release notes](https://github.com/eslint/eslint/releases)
- [Commits](https://github.com/eslint/eslint/compare/v9.39.4...v10.1.0)

Updates `eslint-plugin-react-refresh` from 0.4.26 to 0.5.2
- [Release notes](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/releases)
- [Changelog](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/blob/main/CHANGELOG.md)
- [Commits](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/compare/v0.4.26...v0.5.2)

Updates `globals` from 16.5.0 to 17.4.0
- [Release notes](https://github.com/sindresorhus/globals/releases)
- [Commits](https://github.com/sindresorhus/globals/compare/v16.5.0...v17.4.0)

Updates `typescript` from 5.9.3 to 6.0.2
- [Release notes](https://github.com/microsoft/TypeScript/releases)
- [Commits](https://github.com/microsoft/TypeScript/compare/v5.9.3...v6.0.2)

---
updated-dependencies:
- dependency-name: "@dagrejs/dagre"
  dependency-version: 3.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@dagrejs/graphlib"
  dependency-version: 4.0.1
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@hugeicons/core-free-icons"
  dependency-version: 4.0.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@streamdown/cjk"
  dependency-version: 1.0.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: bun-frontend
- dependency-name: "@streamdown/code"
  dependency-version: 1.1.1
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: lucide-react
  dependency-version: 1.6.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: recharts
  dependency-version: 3.8.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: shadcn
  dependency-version: 4.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: streamdown
  dependency-version: 2.5.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: "@biomejs/biome"
  dependency-version: 2.4.8
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@eslint/js"
  dependency-version: 10.0.1
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: "@types/node"
  dependency-version: 25.5.0
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: eslint
  dependency-version: 10.1.0
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: eslint-plugin-react-refresh
  dependency-version: 0.5.2
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: bun-frontend
- dependency-name: globals
  dependency-version: 17.4.0
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
- dependency-name: typescript
  dependency-version: 6.0.2
  dependency-type: direct:development
  update-type: version-update:semver-major
  dependency-group: bun-frontend
...

Signed-off-by: dependabot[bot] <support@github.com>

* Revert dagrejs upgrades

Keep @dagrejs/dagre at ^2.0.4 and @dagrejs/graphlib at ^3.0.4.

* Revert biome, eslint, typescript, and recharts upgrades

These upgrades break studio/frontend locally:

- @biomejs/biome 2.4.10 fails to parse the existing biome.json
  (files.ignore and organizeImports keys removed in v2; schema
  version mismatch).
- typescript 6.0.2 emits TS5101 on tsconfig.app.json baseUrl
  ("Option 'baseUrl' is deprecated and will stop functioning in
  TypeScript 7.0"), so tsc -b exits 2.
- eslint 10.2.0 conflicts with eslint-plugin-react-hooks@7.0.1,
  which peers on eslint ^9; npm install fails with ERESOLVE.
- recharts 3.8.1 widened LegendPayload.dataKey to include a
  function type, which breaks the React key={item.dataKey} usage
  in src/components/ui/chart.tsx (TS2322).

Hold these at their current pinned versions until the upstream
peer deps and config migrations are ready.

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-08 04:34:33 -07:00
Wasim Yousef Said
8e977445d4
Let recipes use the model loaded in Chat (#4840)
* feat: inject local model provider into recipe jobs via JWT

* feat: auto-generate JWT for local model providers in recipes

* feat: add is_local flag to model provider config types and utils

* fix(studio): skip endpoint validation for local providers

* feat(studio): add local/external model source toggle to provider dialog

* feat(studio): thread localProviderNames through model config dialog chain

* feat(studio): show 'Local model (Chat)' label for local model_provider configs

* fix: hardcode loopback for local endpoint, clear stale creds on toggle

* fix: document TOCTOU/JWT rotation, add deferred import comments, fix is_local serialization

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix(studio): clear stale local model state on provider toggle and validation

* fix(studio): override empty local endpoint in validation and skip model gate for unused providers

* fix(studio): resolve loopback port from app.state, clear stale local provider fields, sync model id on toggle

Address review feedback on the local-model-provider flow:

- Backend (jobs.py): _resolve_local_v1_endpoint now reads the actual bound
  port from app.state.server_port (set in run.py after binding) instead of
  parsing it out of request.base_url, which is wrong behind any reverse
  proxy or non-default port. The two duplicated urlparse blocks are gone.
- Backend (jobs.py): defensively pop api_key_env, extra_headers, extra_body
  from local providers so a previously external provider that flipped to
  local cannot leak invalid JSON or rogue auth headers into the local /v1
  call. Also dedupe the post-loop assignment and tighten the local-name
  intersection so empty names cannot match.
- Backend (jobs.py): hoist datetime and urllib.parse imports to the top
  import block for consistency with the rest of the file.
- Backend (run.py): expose the bound port on app.state.server_port after
  the uvicorn server is constructed.
- Frontend (model-provider-dialog.tsx): clear extra_headers and extra_body
  when toggling to local mode. Hidden inputs would otherwise keep stale
  JSON blocking validate/run.
- Frontend (model-config-dialog.tsx): factor the local-aware provider
  selection logic into applyProviderChange and call it from both
  onValueChange and onBlur, so manually typing a provider name and tabbing
  away keeps the model field consistent.
- Frontend (recipe-studio.ts store): handle both directions of the
  is_local toggle in the cascade. external -> local now backfills
  model: "local" on already-linked model_configs so they pass validation
  immediately, mirroring the existing local -> external clear path.
- Frontend (validate.ts + build-payload.ts): thread localProviderNames
  into validateModelConfigProviders and skip the "model is required"
  check for local-linked configs. Local providers do not need a real
  model id since the inference endpoint uses the loaded Chat model.

* fix(studio): narrow store cascade types, sync model placeholder on graph relink and node removal, harden ephemeral port path

Loop 2 review fixes:

- recipe-studio.ts: type-narrow next.is_local by also checking
  next.kind === "model_provider". TS otherwise raised TS2339 because
  next was typed as the union NodeConfig after the spread. The behavior
  is unchanged but the code now compiles cleanly.
- model-config-dialog.tsx: convert the lastProviderRef / providerInputRef
  ref-during-render pattern (pre-existing react-hooks/refs lint error)
  to a useEffect that syncs providerInputRef from config.provider. The
  combobox blur path still uses applyProviderChange and remains stable.
- recipe-graph-connection.ts: when a graph drag links a model_provider
  to a model_config, mirror the dialog applyProviderChange behavior:
  fill model: "local" if the new provider is local and the model field
  is blank, clear model when relinking from a local placeholder to an
  external provider, otherwise leave the model alone.
- reference-sync.ts: when a referenced provider node is removed, clear
  the synthetic model: "local" placeholder along with the provider
  field, so a future relink to an external provider does not pass
  validation with a stale value that fails at runtime.
- run.py: only publish app.state.server_port when the bound port is a
  real positive integer; for ephemeral binds (port==0) leave it unset
  and let request handlers fall back to request.base_url.
- jobs.py: _resolve_local_v1_endpoint also falls back when
  app.state.server_port is non-positive, and uses `is None` instead of
  the truthy fallback so a literal 0 is handled correctly.

* fix(studio): strict is_local check, narrow loaded-model gate to LLM-reachable configs, add scope-server port fallback

Loop 3 review fixes:

- jobs.py, validate.py: require `is_local is True` instead of truthy
  check. Malformed payloads such as is_local: "false" or is_local: 1
  would otherwise be treated as local and silently rewritten to the
  loopback endpoint.
- jobs.py: _resolve_local_v1_endpoint now tries request.scope["server"]
  (the actual uvicorn-assigned (host, port) tuple) as a second
  resolution step before falling back to parsing request.base_url.
  This covers direct-uvicorn startup paths and ephemeral binds that
  never publish app.state.server_port.
- jobs.py: new _used_llm_model_aliases helper collects the set of
  model_aliases that an LLM column actually references, and the
  "Chat model loaded" gate is now only triggered when a local
  provider is reachable from that set. Orphan model_config nodes on
  the canvas no longer block unrelated recipe runs.

* fix(studio): force skip_health_check on local-linked configs, skip JSON parsing for local providers, local-aware inline editor

Loop 4 review fixes:

- jobs.py: after rewriting local providers, also force
  skip_health_check: true on any model_config linked to a local
  provider. The /v1/models endpoint only advertises the real loaded
  model id, so data_designer's default model-availability health check
  would otherwise fail against the placeholder "local" id before the
  first chat completion call. The inference route already ignores the
  model id in chat completions, so skipping the check is safe.
- builders-model.ts: buildModelProvider now short-circuits for local
  providers and emits only { name, endpoint: "", provider_type, is_local }
  without running parseJsonObject on the hidden extra_headers/extra_body
  inputs. Imported or hydrated recipes with stale invalid JSON in those
  fields no longer block client-side validate/run.
- inline-model.tsx: the model_config branch now accepts an optional
  localProviderNames prop and mirrors the dialog applyProviderChange
  behavior. Changing provider to/from a local one auto-fills or clears
  the "local" placeholder consistently with the other edit paths.
- recipe-graph-node.tsx: derive localProviderNames from the store via
  useMemo (stable identity) and pass it through renderNodeBody to
  <InlineModel>. Hooks order is preserved by declaring them above the
  early return for markdown_note nodes.
- run.py: minor comment tweak - loop 3 already added the scope-server
  fallback path, note that in the comment.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <info@unsloth.ai>
2026-04-08 03:48:22 -07:00
Daniel Han
c3d2d58046
Update dependabot.yml (#4915) 2026-04-08 03:39:50 -07:00
dependabot[bot]
0087515d5c
build(deps): bump oxc-parser (#4776)
Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser).


Updates `oxc-parser` from 0.121.0 to 0.123.0
- [Release notes](https://github.com/oxc-project/oxc/releases)
- [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md)
- [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser)

---
updated-dependencies:
- dependency-name: oxc-parser
  dependency-version: 0.123.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-oxc-validator
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 03:35:40 -07:00
dependabot[bot]
67e9db4921
build(deps): bump oxc-parser (#4776)
Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser).


Updates `oxc-parser` from 0.121.0 to 0.123.0
- [Release notes](https://github.com/oxc-project/oxc/releases)
- [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md)
- [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser)

---
updated-dependencies:
- dependency-name: oxc-parser
  dependency-version: 0.123.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
  dependency-group: npm-oxc-validator
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2026-04-08 03:35:33 -07:00
pre-commit-ci[bot]
c2184af079
[pre-commit.ci] pre-commit autoupdate (#4879)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.8 → v0.15.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.8...v0.15.9)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-07 22:50:48 -07:00
Roland Tannous
f801e59c29
split venv_t5 into tiered 5.3.0/5.5.0 and fix trust_remote_code (#4878)
* split venv_t5 into venv_t5_530 and venv_t5_550 for tiered transformers 5.x support

* fix bfloat16 crash on T4 for FORCE_FLOAT32 models and disable trust_remote_code auto-enable for native t5 models

* revert FORCE_FLOAT32 dtype change

* restrict trust_remote_code auto-enable to Nemotron models only

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* use config.json model_type for tier detection, add unsloth/nvidia namespace guard

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Revert "[pre-commit.ci] auto fixes from pre-commit.com hooks"

This reverts commit fb43d468e2.

* Revert "use config.json model_type for tier detection, add unsloth/nvidia namespace guard"

This reverts commit fc49ae2453.

* add unsloth/nvidia namespace guard to Nemotron trust_remote_code auto-enable

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* reorder tier checks: all substring matches before config.json fetches

* extract shared activate_transformers_for_subprocess into transformers_version.py

* narrow Nemotron trust_remote_code to nemotron_h/nemotron-3-nano, add to export worker

* clean venv_t5 dirs before re-install in setup.sh, clarify version alias comment

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* run venv_t5 migration outside deps fast-path gate in both setup scripts

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-07 20:05:01 +04:00
Daniel Han
1d8160376e
Bump minimum unsloth version to 2026.4.4 in install scripts (#4876) 2026-04-06 09:46:35 -07:00