Commit graph

5061 commits

Author SHA1 Message Date
Wasim Yousef Said
bc9ddb3af6
Fix onboarding followups (#5064)
* Fix onboarding followups

* Rename sidebar studio to train
2026-04-16 10:11:35 -07:00
Wasim Yousef Said
7ef65bd2e5
Chat first onboarding (#5063)
* auth: default to chat

* settings: relaunch onboarding

* onboarding: return to launch page

* studio: stop auto guided tour

* ui: soften global radius

* cleanup: rename onboarding exit prop

* fix onboarding redirect safety

* Show real Unsloth version in settings

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 09:58:10 -07:00
हिमांशु
f4422b0a62
change torchcodec version to 0.10.0 in extra-no-deps (#5043)
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-16 19:50:57 +04:00
Wasim Yousef Said
b01e9af124
feat(studio): replace navbar with collapsible sidebar (#4936)
* feat(studio): replace navbar navigation with collapsible sidebar

Add an app-wide sidebar with hover-expand and pin-to-dock behavior.
Navigation items (Studio, Recipes, Export, Chat) move from the center
pill navbar to the sidebar. Chat threads and recipes render as
collapsible sub-lists. Navbar simplified to logo + update + close.

- Extend SidebarProvider with pinned/hovered state model
- New AppSidebar with animated active indicator, sloth profile menu,
  theme toggle, guided tour, back/forward navigation
- Chat page refactored to URL-driven view state via search params
- Extract reusable hooks for chat thread and recipe sidebar data
- Guard startViewTransition for browser compatibility
- Wrap chat deletions in Dexie transaction for data integrity

* feat(studio): move logo to sidebar and make navbar overlay

- Sidebar is now full-height with logo in SidebarHeader
- Collapsed sidebar shows sticker.png, expanded shows full logo
- Navbar is absolute-positioned overlay (no layout space)
- Main content extends to top, aligning with navbar controls

* feat(studio): full-height sidebar with recents, edge-to-edge nav buttons

- Sidebar outside max-w-7xl, pinned to left edge
- Remove sidebar rounding, menu buttons rounded-md
- Nav buttons flush to sidebar edges with no left rounding
- Replace collapsible recipes/chat with flat nav items
- Add Recents section with chat history (1 item when not on chat, full on chat)
- New Chat as first nav item with PencilEdit02Icon
- Cursor pointer on all sidebar buttons
- Navbar temporarily hidden for screenshots

* fix(studio): fix chat scroll, action bar hover, collapsible recents

- Fix sticky composer by removing `relative` override on viewport footer
- Action bar buttons only show on hover (autohide=always)
- Remove floating border/shadow from action bar
- Add scroll space above composer for last message actions
- Back/forward buttons use router history (stay in-app)
- Recents section collapsible with chevron on chat route
- Set html/body/#root height for proper h-full chain

* fix(studio): address review feedback, clean up unused code

- Unhide navbar (was left hidden from screenshot)
- Remove unused imports: SidebarMenuSub*, BubbleChatIcon, ColumnInsertIcon
- Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen
- Include compare query id in active sidebar selection
- Use store type for contextUsage instead of inline type
- Simplify noop in sidebar.tsx
- Remove empty className prop

* feat(studio): add mobile sidebar, recent runs section, and misc UX fixes

* feat(studio): scaffold settings feature module with dialog store

* feat(studio): add tri-state theme store for settings

* feat(chat): add clear-all-chats and export-chat-history utils

* feat(studio): add settings dialog shell with tab rail

* feat(studio): add appearance tab with theme and sidebar pin

* feat(studio): add settings general tab with hf token, auto-title, reset prefs

* feat(studio): add settings chat tab with export and clear

* feat(studio): add api keys tab with list and revoke flow

* feat(studio): add create-key form and reveal dialog

* feat(studio): add usage examples panel to api keys tab

* feat(studio): add settings about tab with update and shutdown

* feat(studio): add settings dropdown item and cmd-comma shortcut

* feat(studio): remove legacy api-keys route and chat-sheet preference rows

* fix(studio): settings dialog a11y + polish pass

* feat(studio): inline api key reveal card replacing nested dialog

* fix(studio): hide revoked keys from settings list

* refactor(studio): strip navbar and hoist training unload guard

* feat(studio): explicit sidebar toggle, remove hover-open and pin icons

* fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle

* fix(studio): address code review findings for settings dialog

* feat(studio): collapsible navigate group with standalone new-chat and compare

* fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare

* fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible

* feat(studio): add compact logo assets for sidebar header

* Fixed sidebar design

* fix(studio): sidebar delete icon hover contrast and sizing

* feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio)

* feat(studio): add chat search store

* feat(studio): add chat search index hook with snapshot-on-open

* feat(studio): add chat search command dialog with global shortcut

* feat(studio): wire chat search into sidebar

* fix(studio): trim hf token on save, add show/hide toggle, commit on close

* revert(studio): restore original sidebar/border colors, brighten sidebar

* feat(studio): forward overlayClassName through CommandDialog

* fix(studio): wrap search dialog in Command context, redesign as flat 635px card

* fix(studio): reserve right padding on recent items so delete icon stops overlapping title

* fix(studio): skip hf token unmount-commit during reset-prefs reload

* chore(studio): drop unused icon import and unreachable runs navigate branch

* fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text

* fix(studio): keep CommandEmpty in tree so empty state renders correctly

* fix(studio): cap system prompt and chat template textareas so they scroll instead of growing

* fix(studio): attach chat-compare tour anchor to sidebar compare button

* fix(studio): persist system theme explicitly so next-themes does not clobber on reload

* fix(studio): auto-switch to history tab when selecting a recent run from sidebar

* UI overhaul: chatbox, scrollbar, sidebar, and compare view

UI Changes:
- Redesigned the Compare UI with general cleanup
- Redesigned the Chatbox UI
- Reduced the width of the user chat bubble for improved readability
- Narrowed the user chat box across the content page
- Adjusted thinking-box text color to be slightly darker
- Removed faded text effect from chat messages
- Removed faded text effect from the thinking box
- Added a small LLM chat safety note at the bottom of the chatbox
- Restyled the scrollbar

Layout & Behavior:
- Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover
- Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout
- Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior
- Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling
- Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox

Fixes:
- Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started)
- Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side

* Fix user message layout in thread component

* swap code icon

* fix compare layout

* fix compare pane flex

* Sidebar improvements and fixes

- Added scrolling support to the sidebar so menus and recent chats no longer get hidden
- Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export
- Recent chat is now deselected when selecting other navigations
- Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely
- Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar
- Reduced sidebar width on mobile to around 2/3 of the screen (was too wide)
- Made the close-sidebar hover tooltip consistent with the rest of the design
- Removed sidebar collapse/expand animation
- Small adjustment to chat width

* Fix route scrolling, polling, and theme sync issues

* Fix Studio page scrolling

---------

Co-authored-by: sneakr <hauzin@hotmail.com>
2026-04-16 08:46:16 -07:00
Daniel Han
05ec0f110b
Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050)
* Studio: Ollama support, recommended folders, Custom Folders UX polish

Backend:
- Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/*
  and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing
  at the content-addressable blobs, so detect_gguf_model and llama-server
  -m work unchanged for Ollama models
- Filter entries under .studio_links from the generic models/hf/lmstudio
  scanners to avoid duplicate rows and leaked internal paths in the UI
- New GET /api/models/recommended-folders endpoint returning LM Studio
  and Ollama model directories that currently exist on the machine
  (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy
  LM Studio cache), used by the Custom Folders quick-add chips
- detect_gguf_model now uses os.path.abspath instead of Path.resolve so
  the readable symlink name is preserved as display_name (e.g.
  qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...)
- llama-server failure with a path under .studio_links or .cache/ollama
  surfaces a friendlier message ("Some Ollama models do not work with
  llama.cpp. Try a different model, or use this model directly through
  Ollama instead.") instead of the generic validation error

Frontend:
- ListLabel supports an optional leading icon and collapse toggle; used
  for Downloaded (download icon), Custom Folders (folder icon), and
  Recommended (star icon)
- Custom Folders header gets folder icon on the left, and +, search,
  and chevron buttons on the right; chevron uses ml-auto so it aligns
  with the Downloaded and Recommended chevrons
- New recommended folder chips render below the registered scan folders
  when there are unregistered well-known paths; one click adds them as
  a scan folder
- Custom folder rows that are direct .gguf files (Ollama symlinks) load
  immediately via onSelect instead of opening the GGUF variant expander
  (which is for repos containing multiple quants, not single files)
- When loading a direct .gguf file path, send max_seq_length = 0 so the
  backend uses the model's native context instead of the 4096 chat
  default (qwen2.5:0.5b now loads at 32768 instead of 4096)
- New listRecommendedFolders() helper on the chat API

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: log silent exceptions and support read-only Ollama dirs

Replace silent except blocks in _scan_ollama_dir and the
recommended-folders endpoint with narrower exception types plus debug
or warning logs, so failures are diagnosable without hiding signal.

Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed
namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links)
when the Ollama models directory is read-only. Common for system installs
at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models
where the Studio process has read but not write access. Previously the
scanner returned an empty list in that case and Ollama models would
silently not appear.

The fallback preserves the .gguf suffix on symlink names so
detect_gguf_model keeps recognising them. The prior "raw sha256 blob
path" fallback would have missed the suffix check and failed to load.

* Address review: detect mmproj next to symlink target for vision GGUFs

Codex P1 on model_config.py:1012: when detect_gguf_model returns the
symlink path (to preserve readable display names), detect_mmproj_file
searched the symlink's parent directory instead of the target's. For
vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight
file is symlinked but any mmproj sidecar lives next to the real blob
-- mmproj was no longer detected, so the model was misclassified as
text-only and llama-server would start without --mmproj.

detect_mmproj_file now adds the resolved target's parent to the scan
order when path is a symlink. Direct (non-symlink) .gguf paths are
unchanged, so LM Studio and HF cache layouts keep working exactly as
before. Verified with a fake layout reproducing the bug plus a
regression check on a non-symlink LM Studio model.

* Address review: support all Ollama namespaces and vision projector layers

- Iterate over all directories under registry.ollama.ai/ instead of
  hardcoding the "library" namespace. Custom namespaces like
  "mradermacher/llama3" now get scanned and include the namespace
  prefix in display names, model IDs, and symlink names to avoid
  collisions.

- Create companion -mmproj.gguf symlinks for Ollama vision models
  that have an "application/vnd.ollama.image.projector" layer, so
  detect_mmproj_file can find the projector alongside the model.

- Extract symlink creation into _make_symlink helper to reduce
  duplication between model and projector paths.

* Address review: move imports to top level and add scan limit

- Move hashlib and json imports to the top of the file (PEP 8).
- Remove inline `import json as _json` and `import hashlib` from
  function bodies, use the top-level imports directly.
- Add `limit` parameter to `_scan_ollama_dir()` with early exit
  when the threshold is reached.
- Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops
  traversing once enough models are found.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: Windows fallback, all registry hosts, collision safety

_make_link (formerly _make_symlink):
- Falls back to os.link() hardlink when symlink_to() fails (Windows
  without Developer Mode), then to shutil.copy2 as last resort
- Uses atomic os.replace via tmp file to avoid race window where the
  .gguf path is missing during rescan

Scanner now handles all Ollama registry layouts:
- Uses rglob over manifests/ instead of hardcoding registry.ollama.ai
- Discovers hf.co/org/repo:tag and any other host, not just library/
- Filenames include a stable sha1 hash of the manifest path to prevent
  collisions between models that normalize to the same stem

Per-model subdirectories under .studio_links/:
- Each model's links live in their own hash-keyed subdirectory
- detect_mmproj_file only sees the projector for that specific model,
  not siblings from other Ollama models

Friendly Ollama error detection:
- Now also matches ollama_links/ (the read-only fallback cache path)
  and model_identifier starting with "ollama/"

Recommended folders:
- Added os.access(R_OK | X_OK) check so unreadable system directories
  like /var/lib/ollama/.ollama/models are not advertised as chips

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: filter ollama_links from generic scanners

The generic scanners (models_dir, hf_cache, lmstudio) already filter
out .studio_links to avoid duplicate Ollama entries, but missed the
ollama_links fallback cache directory used for read-only Ollama
installs. Add it to the filter.

* Address review: idempotent link creation and path-component filter

_make_link:
- Skip recreation when a valid link/copy already exists (samefile or
  matching size check). Prevents blocking the model-list API with
  multi-GB copies on repeated scans.
- Use uuid4 instead of os.getpid() for tmp file names to avoid race
  conditions from concurrent scans.
- Log cleanup errors instead of silently swallowing them.

Path filter:
- Use os.sep-bounded checks instead of bare substring match to avoid
  false positives on paths like "my.studio_links.backup/model.gguf".

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address review: drop copy fallback, targeted glob, robust path filter

_make_link:
- Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync
  API request would block the backend. Log a warning and skip the
  model when both symlink and hardlink fail.

Scanner:
- Replace rglob("*") with targeted glob patterns (*/*/* and */*/*/*)
  to avoid traversing unrelated subdirectories in large custom folders.

Path filter:
- Use Path.parts membership check instead of os.sep substring matching
  for robustness across platforms.

Scan limit:
- Skip _scan_ollama_dir when _generic already fills the per-folder cap.

* Address review: sha256, top-level uuid import, Path.absolute()

- Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency.
- Move uuid import to the top of the file instead of inside _make_link.
- Replace os.path.abspath with Path.absolute() in detect_gguf_model
  to match the pathlib style used throughout the codebase.

* Address review: fix stale comments (sha1, rglob, copy fallback)

Update three docstrings/comments that still referenced the old
implementation after recent changes:
- sha1 comment now says "not a security boundary" (no hash name)
- "rglob" -> "targeted glob patterns"
- "file copies as a last resort" -> removed (copy fallback was dropped)

* Address review: fix stale links, support all manifest depths, scope error

_make_link:
- Drop size-based idempotency shortcut that kept stale links after
  ollama pull updates a tag to a same-sized blob. Only samefile()
  is used now -- if the link doesn't point at the exact same inode,
  it gets replaced.

Scanner:
- Revert targeted glob back to rglob so deeper OCI-style repo names
  (5+ path segments) are not silently skipped.

Ollama error:
- Only show "Some Ollama models do not work with llama.cpp" when the
  server output contains GGUF compatibility hints (key not found,
  unknown architecture, failed to load). Unrelated failures like
  OOM or missing binaries now show the generic error instead of
  being misdiagnosed.

---------

Co-authored-by: Daniel Han <info@unsloth.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
2026-04-16 08:24:08 -07:00
Daniel Han
ff23ce40b4
Fix review findings for chat-template repair (#5049) (#5056)
* Fix review findings for PR #49

1. Sandbox fallback Jinja env in _VariantTokenizerProxy.apply_chat_template
   (use SandboxedEnvironment, matching _derive_assistant_prefix_by_render)
2. Unwrap benign outer-If guards in _template_ends_with_toplevel_for so
   templates like {% if messages %}{% for ... %}{% endfor %}{% endif %}
   are still repairable (preserves Qwen3-Guard rejection via else-branch
   and add_generation_prompt-name checks)
3. Preserve raw name_or_path in _VariantTokenizerProxy._source_path so
   local-path detection works for dict/list variant tokenizers
4. Context-aware strict-mode messages: omit "will still load" and
   "Set UNSLOTH_STRICT_CHAT_TEMPLATE=1" when already raising

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 08:02:05 -07:00
Daniel Han
b42e3a120d
Remove legacy venv Scripts entry from User PATH on upgrade (#5060)
Older installers persisted the venv Scripts directory directly in the
User PATH registry. The shim approach from #4961 no longer writes that
entry, but on upgrade the old one survived and python.exe / pip.exe
from the unsloth venv continued winning resolution in every new shell.

Before creating the shim, read the current User PATH, filter out any
entry matching $VenvDir\Scripts (using the same symmetric raw+expanded
comparison as Add-ToUserPath), and write back if changed. No-op on
fresh installs where the legacy entry was never written.

Confirmed on a real Windows machine: `where.exe python` was returning
the venv interpreter first even after the shim PR merged.
2026-04-16 07:36:59 -07:00
Daniel Han
5b8643969e Revert "Remove legacy venv Scripts entry from User PATH on upgrade"
This reverts commit cae4a74297.
2026-04-16 14:20:43 +00:00
Daniel Han
cae4a74297 Remove legacy venv Scripts entry from User PATH on upgrade
Older installers persisted the venv Scripts directory directly in the
User PATH registry. The shim approach (added in this PR) no longer writes
that entry, but it also did not remove the old one. On upgrade, the
legacy entry survived and python.exe / pip.exe from the unsloth venv
continued winning resolution in every new shell, which is exactly the
hijack the shim was designed to prevent.

Before creating the shim, read the current User PATH, filter out any
entry matching $VenvDir\Scripts (using the same symmetric raw+expanded
comparison as Add-ToUserPath), and write back if changed. This runs
once per install and is a no-op on fresh installs where the legacy
entry was never written.
2026-04-16 14:19:04 +00:00
Datta Nimmaturi
6764cb9b90
Restrict flash attn to <=256 head dim. Consolidate attn impl checks (#5051)
* Restrict flash attn to <=256 head dim. Consolidate attn impl checks

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Consolidate the changes into single function

* safeguard for dict instead of object

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 09:00:17 -05:00
Daniel Han
c5be8b1cd2
Chat-template repair: warn-by-default, AST classification, dict support (#5049)
* Chat-template repair: warn-by-default, AST classification, dict support

Follow-up hardening on top of PR #4426 (which fixed the #4150
RuntimeError for ChatML LoRA reloads).

Behavior changes:

- Warn-by-default instead of RuntimeError. When fix_chat_template cannot
  repair a broken template, emit a warning and return the original.
  Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail.
  Fixes the UX where a missing `{% if add_generation_prompt %}` block on
  a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would
  block model loading entirely.

- Local path vs HF hub distinguished in the warning message. For local
  paths the message points at the likely downstream tool; for HF IDs it
  points at the upstream model maintainers. Previously both said "file a
  bug report to the maintainers of <path>" even when <path> was the
  user's own saves/ directory.

- Dict / list chat_template now handled. Hermes-3 ships with
  {default, tool_use} and the previous code crashed with
  AttributeError: 'dict' object has no attribute 'find' when entering
  _fix_chat_template with a dict. Each variant is now fixed
  independently; structure is preserved.

Internals:

- _find_end_position now matches all four Jinja whitespace-control
  variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost
  endfor/endif so multi-for templates aren't locked onto the first loop.
  Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was
  silently bypassed.

- _has_add_generation_prompt_block uses Jinja AST via
  jinja2.nodes.If/Name walks instead of substring matching, so
  templates that hide the block behind comments or dash-style variants
  are classified correctly.

- _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on
  the AST: only fires when the last structural top-level node is a For
  (standard ChatML shape), ignoring trailing pure-whitespace output
  nodes. Templates wrapped in an outer If (Qwen3-Guard) are now
  explicitly skipped at the _fix_chat_template level as well, not just
  at load_correct_tokenizer's name-based exemption.

- _validate_patched_template renders the patched template with and
  without add_generation_prompt and confirms the patched output
  responds to the flag by appending (not replacing) content. If
  validation fails, the patch is discarded and we fall through to the
  warn path.

Verified with an expanded regression suite in tests/:
- test_fix_chat_template_pr4426.py: 42/42 template-matrix cells
- test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads
- test_chat_template_followups.py: 10/10 new follow-up tests
- test_mistral_pr4426.py: 5 Mistral variants byte-identical
- test_qwen_pr4426.py: 14 Qwen variants byte-identical
  (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3,
  Qwen3-Coder, QwQ, Qwen3-Guard-Gen)

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Guard _validate_patched_template against read-only chat_template

If tokenizer.chat_template is a property or otherwise read-only, the
validation helper would crash with AttributeError when trying to
temporarily set the patched template. Catch the assignment failure and
return False (skip validation), and best-effort restore in the finally
block.

* Replace regex separator inference with render-diff; broaden repair to non-ChatML templates

The previous `_infer_assistant_separator` was a four-tier regex heuristic that
only worked on ChatML-shaped templates and forced a hard `<|im_start|>` /
`<|im_end|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or
Phi-3 template stripped of its generation-prompt block by a downstream tool
(LlamaFactory, Axolotl, etc.) would still warn-and-return even though the
structural shape is identical to the ChatML case the PR already handles.

This replaces the regex with `_derive_assistant_prefix_by_render`: render the
template with two dialogs that differ only in assistant content, then
`os.path.commonprefix` on the tails captures the exact assistant-turn prefix
the template emits. The template itself is ground truth, so non-ChatML shapes
work as long as the assistant block is a literal the template emits once per
message.

Three guards keep the derivation safe:
  A. both assistant renders extend the base render (no reordering);
  B. the divergence point is exactly the content-insertion site (sentinel
     follows the common prefix);
  C. a user-role cross-check: if a render with a user sentinel also emits
     the same prefix, role has no effect on output and we reject. A render
     failure on [user, user] (e.g. Gemma's `raise_exception` alternation
     check) is evidence that role matters; we accept.

Sentinels differ at character 0 so `commonprefix` cannot absorb them, and
trailing whitespace/comments after the last `{% endfor %}` are stripped
before probing (they would appear in base but not after the appended
assistant turn and break Guard A).

`_fix_chat_template` and `_repair_string_template` now thread an
`is_sharegpt` kwarg; `_fix_chat_template` retries once with
`is_sharegpt=True` if the first probe returns None (dual-probe fallback
for dict/list callers).

The ChatML `<|im_start|>` / `<|im_end|>` hard gate in Case 2 is dropped.
`_infer_assistant_separator` is deleted.

Verified via:
  - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3,
    Gemma, Phi-3 broken-template rows all repair FIX-OK)
  - tests/test_load_correct_tokenizer_pr4426.py: 5/5
  - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover
    non-ChatML repair + probe failure modes)
  - tests/test_mistral_pr4426.py: 5/5 byte-identical
  - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST
    gate still rejects)
  - tests/hermes3_lora_pr4426.py reload: patched template ends with
    `<|im_start|>assistant\n`, inference returns sensible output.
  - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions,
    9 improvements.
  - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4,
    Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan

Resolves three reviewer findings on PR #5049 (`fix/chat-template-followups`):

Finding #1 [10/10]: dict/list variants now route through
`_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy`
adapter. Previously the dict/list branches called `_fix_chat_template`
directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`)
contract, the `no == yes` diagnostic, broken-existing-block detection,
and `_validate_patched_template` guard. The proxy swaps
`base.chat_template` to the variant string before each
`apply_chat_template` call so tokenizer globals (`bos_token`, custom
filters, `raise_exception`) remain available; if the base is read-only
it falls back to isolated Jinja rendering.

Finding #2 [1/10]: `_has_add_generation_prompt_block` now requires the
`If` body to contain at least one `Output` node (a new
`_if_body_emits_content` helper walks descendants). This distinguishes a
real generation-prompt block from a header guard like
`{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}`
(body contains only `Assign`) which references the name but emits
nothing. Also dropped a now-redundant `"add_generation_prompt" not in
scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded
templates still get repaired.

Finding #4 [1/10]: `_find_end_position` now replaces Jinja comments with
equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}`
tokens. This prevents a trailing comment containing those tokens from
being picked as the real end tag. Positions in the padded string map 1:1
to positions in the original template.

Tests:
  - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode
    dict variant, T20 header-guard repair, T21 comment-endfor trap
    added; T4/T5 stubs updated with a working apply_chat_template
    that routes through Jinja).
  - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged.
  - tests/test_load_correct_tokenizer_pr4426.py: 5/5.
  - tests/test_mistral_pr4426.py: 5/5 byte-identical.
  - tests/test_qwen_pr4426.py: 14/14 byte-identical.
  - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline.
  - Phase 3 Hermes-3 broken-LoRA reload: inference still returns
    `'The answer to the equation 2+2 is 4.'`.
  - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real
    stripped templates: probe still derives the expected prefix.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Tighten comments in chat-template helpers

Pure comment minimization across `_find_end_position`,
`_has_add_generation_prompt_block`, `_if_body_emits_content`,
`_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2,
and `_VariantTokenizerProxy`. No behavior change; same intent,
fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix
still pass.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates

Three real bugs from the 10-agent Opus review:

1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare
   `jinja2.Environment`. The probe renders at model-load time (before
   the user calls `apply_chat_template`), so it was a new eager
   code-execution surface that the base HF tokenizer loading does not
   have. SandboxedEnvironment blocks attribute-chain exploits at
   negligible cost.

2. `_repair_string_template` now tries validation with both
   `is_sharegpt=False` and `is_sharegpt=True`. Previously, when
   `_fix_chat_template` internally fell back to the other schema via
   its dual-probe, the outer validation still used the caller's
   original `is_sharegpt` -- rendering with the wrong message keys and
   spuriously dropping a valid repair.

3. `_has_add_generation_prompt_block` now skips `If` nodes whose test
   is a `Not` expression. A negated gate like
   `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when
   agp=False, so its emitting body is not a generation block -- but the
   old code counted any Name reference regardless of polarity.

Cleanup: removed unused `self._label`, added `\r` escape in
generation-block literal, switched variant labels to `!r` formatting,
removed redundant `import os as _os`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Fix jinja2.sandbox import and sandbox proxy fallback

Two critical findings from the 20-reviewer pass:

1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`,
   not sandboxed. All 20 reviewers independently reproduced marker-file
   creation via `cycler.__init__.__globals__['os'].system(...)` during
   `fix_chat_template()`. Fixed: fallback now uses
   `from jinja2.sandbox import SandboxedEnvironment`.

2. [14/20] The render-diff probe did `import jinja2` then referenced
   `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a
   submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6.
   This caused `AttributeError` (swallowed by `except Exception`),
   making the entire Case 2 repair path silently return None in a clean
   process. The 6 reviewers who saw it work had `jinja2.sandbox`
   pre-imported by an earlier module in their process. Fixed: both the
   probe and the proxy fallback now use
   `from jinja2.sandbox import SandboxedEnvironment`.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-16 05:52:33 -07:00
Daniel Han
6e87bade25 Trim verbose comments in PATH helpers
Reduce inline comments from ~160 lines to ~25 across both files.
Keep one-line summaries of the "why"; drop multi-paragraph rationale
blocks that repeated information already captured in commit messages
and PR discussion.
2026-04-16 12:01:01 +00:00
Etherll
ec32ce2e82
fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961)
* fix: replacing SetEnvironmentVariable with direct registry API

* apply reviews

* Use CreateSubKey for HKCU\Environment

* Store PATH backup under HKCU\Software\Unsloth

* Fix $backupKey registry handle leak in PATH backup block

Wrap $backupKey operations in try/finally so the handle is closed even
if GetValue or SetValue throws. The Add-ToUserPath helper already uses
this pattern for its registry key -- the backup block was the only
place missing it.

* Isolate WM_SETTINGCHANGE broadcast from PATH write error handling

Wrap the broadcast dummy-variable calls in their own try/catch so a
broadcast failure does not mask a successful registry PATH write.
Previously, if SetEnvironmentVariable threw after SetValue already
committed the new PATH, Add-ToUserPath would return $false and the
caller would skip Refresh-SessionPath.

* PATH helper polish: venv precedence, quoted entries, raw/expanded dedup

Three small follow-ups surfaced by a 10-reviewer pass against the rebased
PR head. None fix a regression vs main; each strictly improves the new
helpers.

Refresh-SessionPath / Refresh-Environment:
- Move $env:Path to the front of the merge so an activated venv keeps
  precedence over machine/user PATH after a refresh. Pre-PR dropped
  process-only entries entirely; post-PR kept them but at the back.
- Dedup on both raw and expanded forms so %USERPROFILE%\foo and the
  already-expanded C:\Users\me\foo do not both survive.

Add-ToUserPath:
- Trim whitespace and surrounding double-quotes from each compared entry
  so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate
  against an unquoted directory of the same path.

* Back up User PATH inside Add-ToUserPath, before first mutation

Previously only studio/setup.ps1 took a one-time PATH backup, at script
top (line ~547). install.ps1 (the irm | iex entry point) had no backup,
so users who installed via that path had no recovery surface if anything
clobbered their PATH. The PR description's "one-time backup before any
modifications" promise only held for the studio installer flow.

Move the backup into Add-ToUserPath itself: just before the first actual
SetValue mutation, write the pristine raw PATH to
HKCU\Software\Unsloth\PathBackup if no backup already exists. This:

- Covers both entry points (install.ps1 and studio/setup.ps1).
- Captures the TRUE pristine PATH even when install.ps1 runs first and
  studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1
  would otherwise see an already-modified PATH).
- Is idempotent: once a backup exists, subsequent calls preserve it.
- Skips when nothing would mutate (dedup match) or PATH is empty.

The script-top backup in studio/setup.ps1 is kept for defense in depth.

* Refresh PATH: venv-aware merge order

Reconcile two competing concerns about Refresh-SessionPath /
Refresh-Environment surfaced by separate review rounds:

  - venv at the back -> activated venv loses precedence to system Python
  - process at the front -> stale shims (old node, old python, etc.)
    still on $env:Path can beat a freshly installed tool

New merge order:
  1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set
  2. Machine PATH freshly read from registry
  3. User PATH freshly read from registry
  4. Current $env:Path as fallback

This way an explicitly-activated venv keeps priority while a tool the
script just installed wins over any stale entry that was already on
the inherited shell PATH. When no venv is active, fresh registry
entries take precedence as expected.

* Append to User PATH by default, close $envKey in finally

Add-ToUserPath gains a -Position Append|Prepend parameter defaulting to
Append so installing unsloth no longer prepends the bundled venv Scripts
directory ahead of the user's existing python / pip on new shells. The
four current call sites (install.ps1 launcher, studio/setup.ps1 CMake,
nvcc, Python user Scripts) all take the Append default because each one
that needs in-session precedence already does an inline $env:Path prepend
independently. This matches rustup / cargo / nvm / pyenv / uv behavior.

Also wrap the script-top $envKey.GetValue in a try/finally so the
registry handle is released even if the read throws. Matches the pattern
already used for $backupKey five lines below.

* Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended

The previous commit switched Add-ToUserPath to append by default so that
installing unsloth would not silently hijack the user's system python /
pip. That was correct for the venv Scripts dir (which contains python.exe
and pip.exe alongside unsloth.exe), but wrong for the three studio/setup
call sites. Those persist cmake, the driver-compatible nvcc, and the
Python user Scripts dir for future shells, and in all three cases an
older tool already earlier in the user PATH would keep winning after the
install finished. The nvcc case is especially load-bearing: setup selects
a driver-compatible CUDA toolkit, then llama.cpp builds against whatever
wins PATH resolution, so a stale older nvcc produces broken builds.

Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites
(cmake at line 754, nvcc bin at line 1025, Python user Scripts at line
1191). None of those directories holds python.exe, so prepending them
does not re-introduce the original hijack problem. Leave the install.ps1
venv Scripts call on the default Append with a comment explaining why.

* Symmetric dedup, Prepend reorders duplicates, unsloth shim dir

Address three separate findings surfaced by review:

1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded
   registry entries via ExpandEnvironmentVariables but did NOT expand the
   new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was
   already in PATH produced a duplicate. Expand both sides so the check
   is symmetric.

2. -Position Prepend no-op on existing duplicates: the dedup loop
   returned $false as soon as it saw a match, regardless of position.
   That left a late-position duplicate in place instead of moving it to
   the front, so "prepend the newly selected cmake/nvcc" did not always
   beat an older copy earlier in PATH. Partition entries into kept and
   dropped lists, then reinsert a single copy at the requested position.
   Append still returns $false on any match so user-curated orderings
   are not reshuffled. Prepend also returns $false when the only copy
   is already at position 0 so we preserve the user's casing.

3. Stop adding the venv Scripts dir to User PATH entirely. That dir
   holds python.exe and pip.exe alongside unsloth.exe, so neither
   Prepend nor Append worked: prepend hijacked the user's system python
   and pip, append made the freshly-installed unsloth.exe lose to any
   older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add
   with a dedicated shim directory that contains only unsloth.cmd, and
   prepend that dir. The shim calls the venv's unsloth.exe by absolute
   path so future pip upgrades inside the venv propagate automatically.

* Shim via hardlink, Append user Scripts, drop venv sysconfig fallback

Three follow-ups to the c0ab1ab shim commit, targeting concerns raised in
the second 20-reviewer pass:

1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd.
   The batch-file approach had three distinct regressions:
   - cmd.exe expanded %...% sequences inside user arguments, so prompts
     like "What does 50% mean?" got mangled before reaching the CLI
   - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve
     bare-name lookups to .cmd files, so `unsloth` stopped working there
   - Set-Content -Encoding ASCII replaced non-ASCII profile characters
     with '?', so installs under C:\Users\Jörg\... wrote a broken shim
   A hardlink (fallback: copy) of unsloth.exe is a native Windows
   executable with no shell indirection. PATHEXT picks .exe before .cmd
   in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess
   callers hit it directly, and a hardlink stays in sync with the venv
   on pip upgrades because both names point at the same inode.

2. studio/setup.ps1 Python user Scripts dir is added with default Append
   instead of -Position Prepend. That directory holds every pip-installed
   user console script (pip, pytest, huggingface-cli, and so on), not
   just unsloth, so reordering it silently changed resolution order for
   unrelated tools. The new install.ps1 shim at PATH position 0 already
   guarantees `unsloth` resolves to the freshly installed copy, so the
   Python user Scripts entry only needs to be present, not at the front.

3. The sysconfig lookup in studio/setup.ps1 no longer falls back to
   sysconfig.get_path('scripts') when the nt_user scheme dir does not
   exist. When setup.ps1 is invoked from an activated venv (a flow the
   linked issue actually hits) that fallback returns the venv's Scripts
   directory, which would then be added to the persisted User PATH and
   re-introduce the python / pip hijack the shim dir is meant to avoid.
   Stick strictly to the nt_user scheme; skip the block if it does not
   exist on disk.

* Do not crash installer when unsloth.exe shim is locked

The shim update sequence at install.ps1:1095 did a bare Remove-Item /
New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference
a locked target (most commonly 'unsloth studio' still running while the
user re-invokes the installer) turns the Remove-Item failure into a
terminating error that aborts the install with no actionable message.

The existing shim is perfectly usable in that state, so there is no
reason to abort. Wrap the whole remove/link/copy sequence in a try/catch
that logs the probable cause (Studio still running), points at the fix
(close Studio and re-run), and lets the installer finish with the old
launcher still serving the command.

Also only emit the "added unsloth launcher to PATH" step line when the
launcher was actually (re)created AND the PATH entry was newly added --
previously the message fired even when the shim refresh silently failed,
which was confusing.

* Guard shim PATH entry on existence, use NullString for broadcast delete

Two follow-ups surfaced by the latest review pass:

1. Do not add the shim directory to User PATH when the launcher was not
   actually created. Antivirus blocking unsloth.exe, a disk-full volume,
   or restrictive filesystem permissions can make both the hardlink and
   the copy fallback fail on a fresh install. In that case the existing
   sequence would report "added unsloth launcher to PATH" warnings but
   still prepend the empty $ShimDir to User PATH -- the user sees an
   install that claims success but then cannot resolve `unsloth` in a
   new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH
   entry is only persisted when the launcher is really there.

2. Pass [NullString]::Value instead of $null to the broadcast-delete
   call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET
   9), a bare $null going into [Environment]::SetEnvironmentVariable
   can be coerced to an empty string rather than a true .NET null,
   which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in
   HKCU\Environment instead of deleting it. The leaked variable is
   visible in System Properties and accumulates one entry per install
   run. [NullString]::Value is a PowerShell-specific sentinel that
   crosses the interop boundary as a real null and works on both PS 5.1
   and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-16 04:49:51 -07:00
Imgyu Kim
14ab6fbfae
BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4426)
Fixes #4150.

Pre-PR, `_fix_chat_template` only patched templates where a trailing `{{ ... }}` expression followed the last `{% endfor %}`. ChatML templates (Hermes, Magnum, Phi-4, etc.) that end cleanly at `{% endfor %}` with no generation-prompt block were left unchanged, so the outer `fix_chat_template` raised:

```
RuntimeError: Unsloth: The tokenizer `...` does not have a
{% if add_generation_prompt %} for generation purposes.
```

This commonly shows up when a downstream tool (LlamaFactory, Axolotl) re-serializes the tokenizer during LoRA save and strips the generation-prompt block.

This PR adds a second branch to `_fix_chat_template` that fires when:

- the content after the last `{% endfor %}` is empty modulo Jinja `{# ... #}` comments,
- the scrubbed template contains `<|im_start|>` and `<|im_end|>`,
- and the scrubbed template does not already mention `add_generation_prompt`.

The assistant-turn separator is inferred from the template itself (preferring an explicit `'<|im_start|>assistant<sep>'` literal, then the unique `message['role'] + '<sep>'` from role concatenations, then `<|im_sep|>` for Phi-4-mini mixed-separator templates, then `\n`), so Phi-4-style templates are not silently corrupted with the wrong separator.

Verified against the existing chat-template corpus:

- Hermes-3, Magnum-v2, Phi-4-mini, Phi-4 multi-sep, ChatML with trailing whitespace, ChatML with trailing Jinja comment, dot-access `message.role`, split-literal `'<|im_start|>assistant'`: all repaired with the correct assistant prefix.
- Already-fixed ChatML templates: idempotent NOP.
- Trap templates with `<|im_start|>` only inside a Jinja comment: correctly not rewritten.
- Llama-3, Gemma-3, Qwen2.5 (non-ChatML): byte-identical.
- Mistral family (5 models including Mistral-Nemo, Mistral-Small-24B, Mixtral): byte-identical, protected both by the structural guard (no ChatML tokens) and the existing name-based exemption in `load_correct_tokenizer`.
- Qwen family (14 models including Qwen2.5, Qwen3, Qwen3-Coder, QwQ, VL, Math, Qwen3-Guard): byte-identical.

End-to-end reproduction: Hermes-3 LoRA SFT, save with stripped chat_template, reload. Pre-PR code path raises the RuntimeError above. Post-PR reload loads cleanly, patches the template at load time, and `apply_chat_template(add_generation_prompt=True)` produces the correct `<|im_start|>assistant\n` prefix.
2026-04-16 00:21:29 -07:00
DoubleMathew
a4d4dfe4ac
fix Gemma4 flash attn disable (#5045)
* fix pass attn implementation

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 17:50:48 -05:00
Daniel Han
3869fbe1cc
Bump installer minimum to 2026.4.5 (#5041) 2026-04-15 08:23:41 -07:00
Daniel Han
cdb3e752ec Update _utils.py 2026-04-15 08:06:43 -07:00
Daniel Han
ba387e2c8f Update pyproject.toml 2026-04-15 08:06:30 -07:00
Daniel Han
f0d03655e8
Studio: add folder browser modal for Custom Folders (#5035)
* Studio: add folder browser modal for Custom Folders

The Custom Folders row in the model picker currently only accepts a
typed path. On a remote-served Studio (Colab, shared workstation) that
means the user has to guess or paste the exact server-side absolute
path. A native browser folder picker can't solve this: HTML
`<input type="file" webkitdirectory>` hides the absolute path for
security, and the File System Access API (Chrome/Edge only) returns
handles rather than strings, neither of which the server can act on.

This PR adds a small in-app directory browser that lists paths on the
server and hands the chosen string back to the existing
`POST /api/models/scan-folders` flow.

## Backend

* New endpoint `GET /api/models/browse-folders`:
  * `path` query param (expands `~`, accepts relative or absolute; empty
    defaults to the user's home directory).
  * `show_hidden` boolean to include dotfiles/dotdirs.
  * Returns `{current, parent, entries[], suggestions[]}`. `parent` is
    null at the filesystem root.
  * Immediate subdirectories only (no recursion); files are never
    returned.
  * `entries[].has_models` is a cheap hint: the directory looks like it
    holds models if it is named `models--*` (HF hub cache layout) or
    one of the first 64 children is a .gguf/.safetensors/config.json/
    adapter_config.json or another `models--*` subfolder.
  * Sort order: model-bearing dirs, then plain, then hidden; case-
    insensitive alphabetical within each bucket.
  * Suggestions auto-populate from HOME, the HF cache root, and any
    already-registered scan folders, deduplicated.
  * Error surface: 404 for missing path, 400 for non-directory, 403 on
    permission errors. Auth-required like the other models routes.

* New Pydantic schemas `BrowseEntry` and `BrowseFoldersResponse` in
  `studio/backend/models/models.py`.

## Frontend

* New `FolderBrowser` component
  (`studio/frontend/src/components/assistant-ui/model-selector/folder-browser.tsx`)
  using the existing `Dialog` primitive. Features:
  * Clickable breadcrumb with a `..` row for parent navigation.
  * Quick-pick chips for the server-provided suggestions.
  * `Show hidden` checkbox.
  * In-flight fetch cancellation via AbortController so rapid
    navigation doesn't flash stale results.
  * Badges model-bearing directories inline.

* `chat-api.ts` gains `browseFolders(path?, showHidden?)` and matching
  types.

* `pickers.tsx` adds a folder-magnifier icon next to the existing `Add`
  button. Opening the browser seeds it with whatever the user has
  already typed; confirming fills the text input, leaving the existing
  validation and save flow unchanged.

## What it does NOT change

* The existing text-input flow still works; the browser is additive.
* No new permissions or escalation; the endpoint reads only directories
  the server process is already allowed to read.
* No model scanning or filesystem mutation happens from the browser
  itself -- it just returns basenames for render.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: cap folder-browser entries and expose truncated flag

Pointing the folder browser at a huge directory (``/usr/lib``,
``/proc``, or a synthetic tree with thousands of subfolders) previously
walked the whole listing and stat-probed every child via
``_looks_like_model_dir``. That is both a DoS shape for the server
process and a large-payload surprise for the client.

Introduce a hard cap of 2000 subdirectory entries and a
``truncated: bool`` field on the response. The frontend renders a small
hint below the list when it fires, prompting the user to narrow the
path. Below-cap directories are unchanged.

Verified end-to-end against the live backend with a synthetic tree of
2050 directories: response lands at 2000 entries, ``truncated=true``,
listing finishes in sub-second time (versus tens of seconds if we were
stat-storming).

* Studio: suggest LM Studio / Ollama dirs + 2-level model probe

Three improvements to the folder-browser, driven by actually dropping
an LM Studio-style install (publisher/model/weights.gguf) into the
sandbox and walking the UX:

## 1. Quick-pick chips for other local-LLM tools

`well_known_model_dirs()` (new) returns paths commonly used by
adjacent tools. Only paths that exist are returned so the UI never
shows dead chips.

* LM Studio current + legacy roots + user-configured
  `downloadsFolder` from its `settings.json` (reuses the existing
  `lmstudio_model_dirs()` helper).
* Ollama: `$OLLAMA_MODELS` env override, then `~/.ollama/models`,
  `/usr/share/ollama/.ollama/models`, and `/var/lib/ollama/.ollama/models`
  (the systemd-service install path surfaced in the upstream "where is
  everything?" issue).
* Generic user-choice locations: `~/models`, `~/Models`.

Dedup is stable across all sources.

## 2. Two-level model-bearing probe

LM Studio and Ollama both use `root/publisher/model/weights.gguf`.
The previous `has_models` heuristic only probed one level, so the
publisher dir (whose immediate children are model dirs, not weight
files) was always marked as non-model-bearing. Pulled the direct-
signal logic into `_has_direct_model_signal` and added a grandchild
probe so the classic layout is now recognised.

Still O(PROBE^2) worst-case, still returns immediately for
`models--*` names (HF cache layout) and for any direct weight file.

## 3. model_files_here hint on response body

A leaf model dir (just GGUFs, no subdirs) previously rendered as
`(empty directory)` in the modal, confusing users into thinking the
folder wasn't scannable. Added a `model_files_here` count on the
response (capped at 200) and a small hint row in the modal: `N model
files in this folder. Click "Use this folder" to scan it.`

## Verification

Simulated an LM Studio install by downloading the real 84 MB
`unsloth/SmolLM2-135M-Instruct-Q2_K.gguf` into
`~/.lmstudio/models/unsloth/SmolLM2-135M-Instruct-GGUF/`. Confirmed
end-to-end:

* Home listing suggests `~/.lmstudio/models` as a chip.
* Browsing `~/.lmstudio/models` flags `unsloth` (publisher) as
  `has_models=true` via the 2-level probe.
* Browsing the publisher flags `SmolLM2-135M-Instruct-GGUF` (model
  dir) as `has_models=true`.
* Browsing the model dir returns empty entries but
  `model_files_here=1`, and the frontend renders a hint telling the
  user it is a valid target.

* Studio: one-click scan-folder add + prominent remove + plain search icon

Three small Custom Folders UX fixes after real-use walkthrough:

* **One-click add from the folder browser**. Confirming `Use this
  folder` now submits the path directly to
  `POST /api/models/scan-folders` instead of just populating the text
  input. `handleAddFolder` takes an optional explicit path so the
  submit lands in the same tick as `setFolderInput`, avoiding a
  state-flush race. The typed-path + `Add` button flow is unchanged.

* **Prominent remove X on scan folders**. The per-folder delete
  button was `text-muted-foreground/40` and hidden entirely on
  desktop until hovered (`md:opacity-0 md:group-hover:opacity-100`).
  Dropped the hover-only cloak, bumped color to `text-foreground/70`,
  added a red hover/focus background, and sized the icon up from
  `size-2.5` to `size-3`. Always visible on every viewport.

* **Plain search icon for the Browse button**. `FolderSearchIcon`
  replaced with `Search01Icon` so it reads as a simple "find a
  folder" action alongside the existing `Add01Icon`.

* Studio: align Custom Folders + and X buttons on the same right edge

The Custom Folders header used `px-2.5` with a `p-0.5` icon button,
while each folder row used `px-3` with a `p-1` button. That put the
X icon 4px further from the right edge than the +. Normalised both
rows to `px-2.5` with `p-1` so the two icons share a column.

* Studio: empty-state button opens the folder browser directly

The first-run empty state for Custom Folders was a text link reading
"+ Add a folder to scan for local models" whose click toggled the
text input. That's the wrong default: a user hitting the empty state
usually doesn't know what absolute path to type, which is exactly
what the folder browser is for.

* Reword to "Browse for a models folder" with a search-icon
  affordance so the label matches what the click does.
* Click opens the folder browser modal directly. The typed-path +
  Add button flow is still available via the + icon in the
  section header, so users who know their path keep that option.
* Slightly bump the muted foreground opacity (70 -> hover:foreground)
  so the button reads as a primary empty-state action rather than a
  throwaway hint.

* Studio: Custom Folders header gets a dedicated search + add button pair

The Custom Folders section header had a single toggle button that
flipped between + and X. That put the folder-browser entry point
behind the separate empty-state link. Cleaner layout: two buttons in
the header, search first, then add.

* Search icon (left) opens the folder browser modal directly.
* Plus icon (right) toggles the text-path input (unchanged).
* The first-run empty-state link is removed -- the two header icons
  cover both flows on every state.

Both buttons share the same padding / icon size so they line up with
each other and with the per-folder remove X.

* Studio: sandbox folder browser + bound caps + UX recoveries

PR review fixes for the Custom Folders folder browser. Closes the
high-severity CodeQL path-traversal alert and addresses the codex /
gemini P2 findings.

Backend (studio/backend/routes/models.py):

* New _build_browse_allowlist + _is_path_inside_allowlist sandbox.
  browse_folders now refuses any target that doesn't resolve under
  HOME, HF cache, Studio dirs, registered scan folders, or the
  well-known third-party model dirs. realpath() is used so symlink
  traversal cannot escape the sandbox. Also gates the parent crumb
  so the up-row hides instead of 403'ing.
* _BROWSE_ENTRY_CAP now bounds *visited* iterdir entries, not
  *appended* entries. Dirs full of files (or hidden subdirs when
  show_hidden is False) used to defeat the cap.
* _count_model_files gets the same visited-count fix.
* PermissionError no longer swallowed silently inside the
  enumeration / counter loops -- now logged at debug.

Frontend (folder-browser.tsx, pickers.tsx, chat-api.ts):

* splitBreadcrumb stops mangling literal backslashes inside POSIX
  filenames; only Windows-style absolute paths trigger separator
  normalization. The Windows drive crumb value is now C:/ (drive
  root) instead of C: (drive-relative CWD-on-C).
* browseFolders accepts and forwards an AbortSignal so cancelled
  navigations actually cancel the in-flight backend enumeration.
* On initial-path fetch error, FolderBrowser now falls back to HOME
  instead of leaving the modal as an empty dead end.
* When the auto-add path (one-click "Use this folder") fails, the
  failure now surfaces via toast in addition to the inline
  paragraph (which is hidden when the typed-input panel is closed).

* Studio: rebuild browse target from trusted root for CodeQL clean dataflow

CodeQL's py/path-injection rule kept flagging the post-validation
filesystem operations because the sandbox check lived inside a
helper function (_is_path_inside_allowlist) and CodeQL only does
intra-procedural taint tracking by default. The user-derived
``target`` was still flowing into ``target.exists`` /
``target.is_dir`` / ``target.iterdir``.

The fix: after resolving the user-supplied ``candidate_path``,
locate the matching trusted root from the allowlist and rebuild
``target`` by appending each individually-validated segment to
that trusted root. Each segment is rejected if it isn't a single
safe path component (no separators, no ``..``, no empty/dot).
The downstream filesystem ops now operate on a Path constructed
entirely from ``allowed_roots`` (trusted) plus those validated
segments, so CodeQL's dataflow no longer sees a tainted source.

Behavior is unchanged for all valid inputs -- only the
construction of ``target`` is restructured. Live + unit tests
all pass (58 selected, 7 deselected for Playwright env).

* Studio: walk browse paths from trusted roots for CodeQL

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@h100-8-cheapest.us-east5-a.c.unsloth.internal>
2026-04-15 08:04:33 -07:00
Roland Tannous
800ddc95f8
Re-apply #4939: updated models template mappers (#4950)
* Reapply "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945)

This reverts commit 33503ea248.

* Add missing gemma-4-31B-it bnb-4bit mapper entry and LFM2.5 upstream namespace for PR #4950

- Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to __INT_TO_FLOAT_MAPPER so
  the int-to-float resolution works for this model (already listed in
  TEMPLATE_TO_MODEL_MAPPER but had no mapper entry).
- Add LiquidAI/LFM2.5-1.2B-Instruct to lfm-2.5 TEMPLATE_TO_MODEL_MAPPER
  entry so the canonical upstream namespace is mapped consistently with lfm-2.

* Add missing gemma-4-31B-it bnb-4bit Ollama mapping and lfm-2.5 chat template alias

- Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to OLLAMA_TEMPLATE_TO_MODEL_MAPPER
  so Ollama export works for this model (E2B-it and E4B-it bnb-4bit variants were
  already present, 31B-it was inconsistently omitted)
- Register CHAT_TEMPLATES["lfm-2.5"] as alias of the lfm-2 template to prevent
  KeyError when Studio resolves LFM2.5 models through MODEL_TO_TEMPLATE_MAPPER

* Add missing LFM2 bnb-4bit INT_TO_FLOAT_MAPPER entry

unsloth/LFM2-1.2B-unsloth-bnb-4bit is referenced in model_mappings.py
but had no mapper.py entry, so model resolution would fail when users
load that variant with load_in_4bit=False or when the float name is
used with load_in_4bit=True.

* Fix review findings for PR #16

1. ollama_template_mappers.py: Restore dropped Gemma-4 base model IDs
   (E2B, E4B, 31B, 26B-A4B) and add missing google/ upstream IDs to
   the gemma4 Ollama mapper for consistency with other gemma entries.

2. mapper.py: Remove self-mapping non-bnb-4bit entries from
   __INT_TO_FLOAT_MAPPER that were polluting FLOAT_TO_INT_MAPPER with
   lowercase 16-bit names, causing load_in_4bit=True to return bad
   model names. Add direct MAP_TO_UNSLOTH_16bit entries to preserve
   the google->unsloth 16-bit redirects.

3. mapper.py: Add LFM2.5 MAP_TO_UNSLOTH_16bit redirect so
   LiquidAI/LFM2.5-1.2B-Instruct resolves to its unsloth mirror.

* Add review tests for PR #4950

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Remove top-level test files

These test_*.py files were added at the repo root rather than under tests/.
Removing them from this PR; the production mapper changes remain.

* Add gemma-4-26B-A4B-it mapping

Adds unsloth/gemma-4-26B-A4B-it to __INT_TO_FLOAT_MAPPER as a 2-tuple so
google/gemma-4-26B-A4B-it routes to unsloth/gemma-4-26B-A4B-it across
INT_TO_FLOAT_MAPPER, FLOAT_TO_INT_MAPPER, and MAP_TO_UNSLOTH_16bit.

The 26B-A4B (MoE) model has no bnb-4bit variant, so the key uses the
plain unsloth name rather than the -unsloth-bnb-4bit suffix.

Removes the now-redundant standalone _add_with_lower call for the -it
variant; the 16bit mapping is registered via the dict loop.

* Add unsloth-bnb-4bit mappings for gemma-4 base (non-it) models

Adds E2B, E4B, 31B base unsloth-bnb-4bit entries to __INT_TO_FLOAT_MAPPER.
The 26B-A4B (MoE) base has no bnb-4bit variant on HF, so it stays on the
standalone _add_with_lower line for the 16bit-only routing.

Removes the redundant _add_with_lower lines for E2B, E4B, 31B base since
the dict loop now registers the same google->unsloth route through the
2-tuple entries, plus full FLOAT_TO_INT and INT_TO_FLOAT coverage.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 07:52:12 -07:00
Avaya Aggarwal
7c5464ad71
feat: Add cactus QAT scheme support (#4679)
* feat: Add cactus QAT scheme support

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* test(qat): add tests for cactus QAT scheme and fix missing import

* Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter

- Drop the broken `from torchao.dtypes import MappingType` import. `MappingType`
  lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`);
  it is not exported from `torchao.dtypes` in any supported torchao release
  (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on
  every cactus call and was masked as a misleading 'torchao not found' error.
- Since `IntxWeightOnlyConfig` already defaults `mapping_type` to
  `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the
  import. Behavior is unchanged.
- Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4
  pattern in the surrounding branches) and add a `% group_size == 0`
  divisibility guard to the filter. `PerGroup(32)` requires
  `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises
  `ValueError: in_features (N) % group_size (32) must be == 0`. The old
  `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65,
  127) and crash `_prepare_model_for_qat` for those shapes.

* Warn when cactus QAT skips non-divisible Linear layers

Multiple reviewers flagged that the divisibility guard added in the
previous commit can silently leave Linear layers in full precision when
their in_features is not a multiple of 32. For currently supported
Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is
already a multiple of 32/64/128 so this never triggers, but surfacing
the coverage gap is cheap and avoids users assuming 100% QAT coverage
when they bring a custom model with unusual shapes.

Emit a UserWarning listing up to the first 8 skipped layers whenever
the cactus filter excludes any Linear due to the modulo guard. This
keeps the lenient silent-skip behavior (consistent with int4 /
fp8-int4), but stops making it silent.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-15 07:40:03 -07:00
Avaya Aggarwal
f18e9dddf0
feat: Add support for OLMo-3 model (#4678)
* feat: Add support for OLMo-3 model in mapping and tests

* Update unsloth/models/mapper.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Update tests/test_get_model_name.py

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

* Fix casing, add Think variants, and align version gate for OLMo-3 PR 4678

Mapper: switch slugs from OLMo-3 to canonical Olmo-3 mixed case, drop the
non-existent unsloth/Olmo-3-7B-Instruct-bnb-4bit dead alias, and add the
already-published Olmo-3-7B-Think and Olmo-3-32B-Think Unsloth mirrors.

Loader: change the olmo3 transformers version gate from Version("4.57.0")
to Version("4.57.0.dev0") so nightly/source builds that already contain
olmo3 are not blocked, matching the OLMo-2, Gemma 3 and Cohere patterns.

* Use canonical Olmo-3 casing and cover Think variants in OLMo-3 tests

Mirrors the mapper.py fixes on pr-4678-code: HuggingFace canonical slugs
for the OLMo-3 family use mixed-case Olmo-3 (not OLMo-3 like OLMo-2), and
Unsloth already hosts Olmo-3-7B-Think and Olmo-3-32B-Think mirrors, so
the resolution matrix now covers all three published Olmo-3 families.

---------

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-15 07:39:11 -07:00
Daniel Han
c3cd890357
Studio: refresh Downloaded GGUF list and recurse into variant subdirs (#5032)
* Studio: refresh Downloaded GGUF list and recurse into variant subdirs

Two fixes for the model picker's "Downloaded" section.

Frontend (`pickers.tsx`):
* `HubModelPicker`'s mount effect short-circuited the cached-gguf and
  cached-models refetch whenever the module-level cache already had
  entries (`if (alreadyCached) return;`). After downloading a new repo
  in the same session, reopening the picker rendered the stale cache
  and the new repo never appeared in "Downloaded" until a full page
  reload. The early return is removed so the lists are always refreshed
  on mount; the module cache still drives the initial render so there
  is no spinner flash when we already had data.

Backend (`utils/models/model_config.py`):
* `list_local_gguf_variants` and `_find_local_gguf_by_variant` used a
  non-recursive `Path.glob("*.gguf")`. Some HF GGUF repos (e.g.
  `unsloth/gemma-4-26B-A4B-it-GGUF`) place the largest quants under a
  variant-named subdirectory such as `BF16/...gguf`, which the
  top-level glob missed. Both helpers now use `rglob` and the variant
  filename is stored as a path relative to the scan root so the
  locator can still find the file.

The flat-layout case (variants directly in the snapshot root) is
unchanged: verified against `unsloth/gemma-4-E2B-it-GGUF` which still
returns its UD-Q4_K_XL variant correctly.

* Studio: emit posix-style relative filenames for local GGUF subdirs

`list_local_gguf_variants` was doing `str(f.relative_to(p))`, which on
Windows produces backslash-separated paths like `BF16\foo.gguf`. The
remote `list_gguf_variants` (HF API path) always returns forward-slash
filenames such as `BF16/foo.gguf`, so the two would diverge on Windows.

Switch to `.as_posix()` so the local and remote variant filenames stay
identical across Linux, macOS, and Windows. Verified by simulating with
`PureWindowsPath` in the test suite.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: detect mmproj at snapshot root for nested-variant layouts

When _find_local_gguf_by_variant returns a weight file inside a
quant-named subdir (e.g. snapshot/BF16/foo.gguf), detect_mmproj_file
was scanning only the immediate parent and missing the mmproj file
sitting at the snapshot root. The model was then loaded without
--mmproj, silently breaking vision support for repos that ship
nested variants.

detect_mmproj_file now takes an optional search_root and walks up
from the weight file to that root, in order, so the mmproj at the
snapshot root is picked up. Sibling quant subdirs are not scanned,
so an unrelated variant's mmproj does not leak in.

Also apply the suggested micro-optimization on relative_to in
list_local_gguf_variants -- only build the posix path when storing
the first file for a quant.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 07:34:42 -07:00
Daniel Han
156f3fc4b0
Gate trl disable_gradient_checkpointing patch warning on UNSLOTH_ENABLE_LOGGING (#5038)
The "Patched trl.models.utils.disable_gradient_checkpointing with a no-op"
warning fires once on every Unsloth import, including from notebooks where
the user did not opt into verbose logging. It is a routine integration
patch, not an anomaly the user needs to know about. Gate it on
UNSLOTH_ENABLE_LOGGING=1 like other diagnostic notices.
2026-04-15 07:33:48 -07:00
jonahsamost
777e1bd0ac
fix (#4887) 2026-04-15 07:21:03 -07:00
Daniel Han
1a4ca5eca8
Fix grad-accum accepts_loss_kwargs detection for vision wrappers (#5036)
* Fix grad-accum model_accepts_loss_kwargs detection for vision wrappers

Replace the source-string rewrite of Trainer.__init__ with an instance-level
accepts_loss_kwargs shadow applied on the loaded model. Covers:

  1. Unsloth-compiled forward -> True, so HF Trainer does not double-scale
     on top of unsloth_fixed_cross_entropy's num_items_in_batch division.
  2. Stock forward on a conditional-generation wrapper (Gemma3n, Gemma3
     pre-4.57, Qwen-VL family, etc.) where the outer class has no
     accepts_loss_kwargs but the inner .model declares False -> False.
     This is the case that reproduces issue #4982 under trust_remote_code
     or UNSLOTH_COMPILE_DISABLE, where the previous fix's outer-attr
     check walked past the inner model and fell through to signature
     inspection.
  3. Text LMs without any explicit accepts_loss_kwargs -> leave HF default.

The previous .replace()-based patch silently no-ops on transformers 4.48
through 4.52 (variable named model, not unwrapped_model) and is fragile
against any upstream reformat. The new helper walks the PEFT / HF wrapper
chain, finds the first class that declares accepts_loss_kwargs on its own
class dict (type(m).__dict__, not hasattr, to avoid PEFT __getattr__
forwarding), and setattr-shadows that value at every wrapper level so
HF Trainer's hasattr(unwrapped_model, ...) check picks it up at whichever
level accelerate.unwrap_model returns.

Also adds an unconditional post-init clamp of
accelerator.gradient_accumulation_steps = 1 to work around the
transformers 5.0 through 5.5 GradientAccumulationPlugin regression that
makes accelerator.backward divide loss by GA on top of training_step's
own /GA division. Fixed upstream in 5.6.0.dev0; no-op on 4.x and 5.6+.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Trim comments

* Address review: cover PEFT-after-load and custom compile location

Two review findings from 3/20 reviewers:

1. [3 of 20 reviewers] apply_accepts_loss_kwargs_fix was called from the
   loaders before get_peft_model wraps the base model, so on transformers
   4.48-4.52 (which does hasattr on the outer model) the instance shadow
   on the base model was lost after PEFT wrapping. Fix: also call it from
   the wrapped Trainer.__init__ so it runs on whatever model the user
   actually hands to Trainer, which is always the final wrapped form.

2. [1 of 20 reviewers] _forward_is_unsloth_compiled hard-coded the
   substrings "unsloth_compiled" / "unsloth_cache" in the co_filename
   check, which misclassifies compiled forwards when
   UNSLOTH_COMPILE_LOCATION is set to a custom directory. Fix: new
   _unsloth_compile_cache_leaves helper that reads the env var and
   matches the basename against path components, honoring both the
   default and any user override.

Verified locally:
- PEFT-after-load simulation: HF's hasattr(peft, "accepts_loss_kwargs")
  now returns True after our init wrapper runs, and value resolves to
  False on Gemma3n-style inner wrappers.
- Custom UNSLOTH_COMPILE_LOCATION simulation: compiled detection returns
  True for /tmp/my_custom_cache/compiled.py when the env var is set.
- End-to-end Gemma-3 270m + LoRA SFT unchanged: loss 4.9626, grad-norm
  matches prior run, all 4 wrapper levels now carry the shadowed attr.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 06:59:36 -07:00
Daniel Han
1ccfd2e0a5
fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033)
* fix(rocm): tighten gfx regex to ignore generic ISA lines

ROCm 6.1+ rocminfo emits generic ISA names such as
"amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic"
alongside the real GPU name. The previous `gfx[1-9]` regex used in
`_has_rocm_gpu` matched both, so a host with only a generic ISA entry
would be reported as having a usable AMD GPU.

Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids
match. This covers every documented target from GFX6 (gfx600) through
GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 /
MI250X) and gfx90c. Documented generic ISA names always have 1 or 2
digits before the dash and no longer match.

Applied to both `studio/install_python_stack.py` and
`studio/install_llama_prebuilt.py` so the two detection paths agree.

Co-authored-by: Martin Hoyer <mhoyer@redhat.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 05:24:41 -07:00
Daniel Han
b7a8ff2833
Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) (#5034)
* Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027)

FastLanguageModel.from_pretrained(..., num_labels=N) crashed with
"NotImplementedError: normal_kernel_cuda not implemented for 'Byte'" on
pre-quantized bnb 4-bit checkpoints (e.g. unsloth/Qwen3-4B-bnb-4bit)
when running on transformers 5.x.

Two pieces were needed to close this out:

1. unsloth_zoo PR: add "score", "classifier", "qa_outputs" to
   SKIP_QUANTIZATION_MODULES so replace_with_bnb_linear leaves task
   heads in the compute dtype.

2. This commit: for pre-quantized checkpoints, transformers reads
   llm_int8_skip_modules from the quantization_config baked into
   config.json and ignores the runtime BitsAndBytesConfig we pass via
   kwargs. Unsloth must merge its skip list into
   model_config.quantization_config.llm_int8_skip_modules before the
   from_pretrained call, or the checkpoint's frozen list
   (e.g. ["lm_head", "multi_modal_projector", "merger",
   "modality_projection"]) wins and the `score` head gets converted to
   Linear4bit with uint8 storage, then _init_weights calls normal_ on
   uint8 and crashes.

Also add a defensive post-load cast on the task head to guard against
any residual path that ends up with a non-floating head dtype.

Verified on transformers 4.57.6 and 5.5.0 with:
- unsloth/Qwen3-4B-bnb-4bit + num_labels=3
- unsloth/Qwen3-4B (non-bnb repo, load_in_4bit=True)
- unsloth/Llama-3.2-1B-Instruct + num_labels=3
- unsloth/ModernBERT-large classifier head (bert_classification notebook)
- Regression: causal LM path unchanged, backbone still 4-bit
- 3-step SFT on num_labels=3 confirms gradient flow and weight updates
  on score.weight

Fixes unslothai/unsloth#5027

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 05:16:33 -07:00
David Solanas Sanz
1fcb2502cf
fix: prevent offline freeze by fixing stats retry and forwarding local_files_only (#5016)
Fixes #2393.

- `_utils.py`: `has_internet()` now respects `HF_HUB_OFFLINE` with truthy variant parsing in addition to `TRANSFORMERS_OFFLINE`.
- `_utils.py`: replace uncontrolled `except Exception: stats_check()` retry (which had no time limit and could freeze on Kaggle offline mode) with a logged skip.
- `loader.py`: forward `local_files_only` from kwargs into all `AutoConfig.from_pretrained` and `PeftConfig.from_pretrained` probes in `FastLanguageModel.from_pretrained` and `FastModel.from_pretrained`, including the PEFT base-model reload paths.
2026-04-15 04:51:31 -07:00
Lee Jackson
f9ef639dde
Studio: support GGUF variant selection for non-suffixed repos (#5023)
* fix: support GGUF variant selection for non-suffixed repos

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: harden GGUF detection across cached models and picker flows

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* chore: use shared GGUF picker helper for search rows

* fix: avoid mixed cache duplication and preserve GGUF fallback detection

* fix: unify GGUF cache matching and merge picker hints

* fix: normalize local GGUF matching across picker and model config

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* fix: robust cached-gguf classification + hint-aware click routing

- _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback
  by commit_hash so partial/interrupted downloads don't TypeError out of
  sum() and wipe the entire cached list.
- list_cached_gguf / list_cached_models: narrow per-repo try/except so
  one malformed repo no longer poisons the whole response.
- handleModelClick: route through isKnownGgufRepo instead of the
  suffix-only isGgufRepo, so non-suffixed GGUF repos still open the
  variant expander from every call site.
- Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known
  GGUF ids to stop conflating "no hint" with "known not-GGUF".
- Make HfModelResult.isGguf required (it is always set in makeMapModel).
- Add regression tests for the None size case, mixed-repo inclusion in
  cached-gguf, and per-repo error isolation.

* fix: exclude mmproj from GGUF classification and case-normalize hint lookups

- _repo_gguf_size_bytes now filters mmproj vision-adapter files so
  safetensors+mmproj.gguf repos stay on the cached-models path and
  non-GGUF rows no longer show zero pickable variants. A vision-capable
  GGUF repo (main weight + mmproj adapter) still classifies as GGUF and
  reports the main weight size.
- modelGgufIds / resultGgufIds now key on lowercased ids and
  isKnownGgufRepo lowercases its lookup, so store and HF-search ids
  that differ only by casing still match the same GGUF hint.
- New regression tests: mmproj-only repo excluded from cached-gguf,
  same repo included in cached-models, vision-capable repo still
  classified as GGUF with correct size.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-15 15:32:01 +04:00
Roland Tannous
13928b5f0e
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024)
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var

When set, UNSLOTH_PYTORCH_MIRROR overrides the default
https://download.pytorch.org/whl base URL in all four install scripts
(install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py).
When unset or empty, the official URL is used. This lets users behind
corporate proxies or in regions with poor connectivity to pytorch.org
point at a local mirror without patching scripts.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py

Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back
to the official URL when unset or empty, and preserves the value as-is
(including trailing slashes).

* Remove stale test assertions for missing install.sh messages

* Fix GPU mocking in test_get_torch_index_url.sh

Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside
get_torch_index_url so the GPU-presence checks work in tests.
Add -L flag handling to mock nvidia-smi so it passes the GPU listing
check. All 26 tests now pass on CPU-only machines.

* Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-15 11:39:11 +04:00
Datta Nimmaturi
826c98f3c0
[moe][gemma4] Target MoE for gemma4 (#4913)
* Target MoE for gemma4

* refactor attention impl determine

* Revert "refactor attention impl determine"

This reverts commit 888fca08110a9a74278dc1ebc14d0da043bbd11d.

* Remove attention policy changes from gemma4 MoE fix
2026-04-14 16:53:07 -05:00
Daniel Han
5aa8c15246
Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021)
* Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast

llama-server's default behavior when the KV cache fills is to silently
drop the oldest non-``n_keep`` tokens and keep generating. The UI has
no way to tell the user that earlier turns were evicted -- they just
see degraded continuity and a confusing ``5,361 / 4,096`` on the
context usage bar.

Launch llama-server with ``--no-context-shift`` so it returns a clean
error once the request would exceed ``n_ctx``. In the chat adapter,
catch the error, identify it as a context-limit error via
``isContextLimitError()``, and surface a dedicated toast that names
the exact control to adjust: the ``Context Length`` field in the chat
Settings panel.

Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage
crosses 85%, so users see the "raise Context Length in Settings"
suggestion before they hit the hard stop.

Tests:

  * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift``
    flag in the static launch-command template, and pins it inside the
    unconditional ``cmd = [ ... ]`` block so a future refactor can't
    hide it behind a branch.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Shorten --no-context-shift comment to 1 line

* Match backend _friendly_error rewrite in isContextLimitError

Codex review on PR caught that ``backend/routes/inference.py::_friendly_error``
rewrites the raw llama-server text
  "request (X tokens) exceeds the available context size (Y tokens)"
into
  "Message too long: X tokens exceeds the Y-token context window. ..."
on the main streaming GGUF path. The heuristic only looked for
"context size" / "exceeds the available context" / "context shift",
none of which survive the rewrite, so the new "Context limit reached"
toast would never fire for the most common case. Add matches for
"message too long" and "context window" so both wordings hit.

Also addresses Gemini feedback on the launch-flag test:
  * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of
    reading ``__file__`` directly; scopes the assertions to the
    function that actually launches llama-server.
  * Replace the hardcoded ``"            ]"`` indent search with a
    line-at-a-time scan for a line that is just ``]``, so the test
    survives reformatting.

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 10:58:20 -07:00
Daniel Han
5861a7ce15
Studio: split model-load progress label across two rows (#5020)
* Studio: split model-load progress label across two rows

The chat flow and training overlay both compose a progress label like
"112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the
percent badge in a single flex row. Once the rate + ETA part shows up,
the label outgrows the row width and wraps mid-phrase, orphaning the
percent ("19 left %") onto a second ragged line.

Fix in model-load-status.tsx: split the label on the first " • " into
a primary (size) chunk that stays on row 1 with the percent, and a
secondary (rate/ETA) chunk that renders on its own muted row below.
Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly
to one row. The inline-status variant keeps only the primary and
surfaces the full label via the tooltip.

Also extracts the rate/ETA math out of useTransferStats into a pure
``transfer-stats.ts`` module (appendSample + computeTransferStats) so
it can be reasoned about and tested without React. The hook is now a
thin wrapper that feeds sample history through the pure functions.

Backend: adds two companion test files for load_progress():

  * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform
    matrix (Linux /proc, macOS/Windows absence), VmRSS parsing
    variants (tab/space/missing/malformed), filesystem edges (HF-cache
    symlinks, broken symlinks, nonexistent paths, relative paths),
    shard aggregation (partial multi-shard, two series in same dir,
    mmproj-* exclusion, single-file), lifecycle races, concurrent
    sampling (10 threads x 50 iters against real /proc), fraction
    bounds.
  * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live
    integration: real subprocess allocating 100 MB to match VmRSS,
    real ready phase, real dead-pid degradation, real shard
    aggregation, repeated polling. Skipped on non-Linux.

Both complement the existing test_llama_cpp_load_progress.py.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Hoist splitProgressLabel out of JSX IIFE (review feedback)

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 10:58:16 -07:00
Eda Z
5b8dbdc3c2
Fix bitsandbytes ROCm install by using pip instead of uv (#4966)
* Fix bitsandbytes ROCm install by using pip instead of uv

* Also use pip for PyPI fallback path in _install_bnb_rocm

The original fix correctly switched the pre-release wheel install from
uv to pip, but left the PyPI fallback path on uv. If uv breaks bnb
on ROCm, the fallback would hit the same issue. Move pip bootstrap
before the branch so both paths use pip consistently.

* Harden pip bootstrap: try ensurepip first, warn on failure

- Try ensurepip --upgrade before falling back to uv pip install pip.
  ensurepip works offline and does not need PyPI, making the bootstrap
  robust when the network or index is unavailable.
- If both ensurepip and uv fail, emit a visible warning instead of
  silently swallowing the error (which previously led to a cryptic
  "No module named pip" downstream).
- Use run_maybe_quiet so --verbose users see bootstrap output.
- Update comment to document the actual root cause: uv rejects the
  wheel because filename version and metadata version disagree.

* Add --isolated to pip install calls in _install_bnb_rocm

uv pip install ignores pip.conf and PIP_* env vars, but python -m pip
reads them. Without --isolated, users with PIP_INDEX_URL pointing to a
private mirror that does not carry bitsandbytes would see the PyPI
fallback fail where it previously worked under uv. --isolated restores
parity with the old uv behavior.

* Drop --isolated from PyPI fallback in _install_bnb_rocm

--isolated suppresses PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, and pip.conf.
This is correct for the pre-release path (hardcoded GitHub URL, no index
consulted), but breaks the PyPI fallback for users in corporate or
air-gapped environments whose only route to bitsandbytes is a private
mirror configured via those mechanisms. Keep --isolated on the direct-URL
pre-release install; drop it from the index-dependent fallback.

* Drop --isolated from pre-release pip install, fix warning wording

--isolated suppresses pip.conf cert/proxy/CA settings in addition to
index config. For the direct GitHub URL, index config is irrelevant but
cert/proxy settings matter in corporate SSL-inspection environments.
Without this fix, users with pip.conf-based CA bundles get a TLS error
on the pre-release download and silently fall back to the broken PyPI
version -- the exact outcome the PR is trying to prevent.

Also fix the fallback warning: "unreachable" is too specific since the
pre-release install can fail for reasons other than network reachability.

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
2026-04-14 10:23:40 -07:00
pre-commit-ci[bot]
a0b9d14081
[pre-commit.ci] pre-commit autoupdate (#5004)
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.9 → v0.15.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.9...v0.15.10)

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:49:18 -07:00
Daniel Han
bb14ab144a
Studio: live model-load progress + rate/ETA on download and load (#5017)
* Studio: live model-load progress + rate/ETA on download and load

Two UX fixes for the opaque multi-minute wait between clicking Load
and being able to chat, visible most clearly on large MoE GGUFs like
MiniMax-M2.7 (131 GB of weights on a 97 GB GPU):

1. **Model-load phase is now observable.** The existing chat flow
   transitions the toast to "Starting model..." as soon as the
   download hits 100%, then shows a spinner with no other feedback
   until llama-server reports healthy. For a 130 GB model that spinner
   freezes for five-plus minutes while the kernel pages shards into
   the page cache. A new `GET /api/inference/load-progress` endpoint
   samples `/proc/<pid>/status VmRSS` on the llama-server subprocess
   against the sum of shard file sizes on disk, so the UI can render
   a real bar plus rate / ETA during that window.

2. **Rate and ETA on downloads and loads.** Both the chat toast and
   the training-start overlay used to show a static pair of numbers
   (for example "15.4 of 140.8 GB"). A rolling 15-second window over
   the existing byte-series now surfaces "85.3 MB/s, 24m 23s left"
   beside that pair. The estimator is shared between the download
   and load phases so the numbers don't reset when the phase flips.

Also fixes a pre-existing assignment bug uncovered while wiring this
up: `load_model` was storing the caller's `gguf_path` kwarg into
`self._gguf_path`, which is `None` on the HF-download code path. The
resolved on-disk path (`model_path`) is what llama-server actually
mmaps; downstream consumers need that. No existing reader used
`_gguf_path`, so this is a correctness fix for the new endpoint.

- Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model.
- Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay.
- Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:46:22 -07:00
Roland Tannous
514bb3a20e
studio: pin peft to 0.18.1 to fix export subprocess issues (#5015)
* studio: pin peft to 0.18.1 to fix export subprocess issues

peft 0.19.0 causes export subprocess shutdown failures in Studio.
Reverting to 0.18.1 resolves the issue.

* studio: move peft pin to extras-no-deps to prevent torch upgrade

Installing peft via overrides.txt would resolve its deps and pull in
torch>=0.11.0, breaking other pinned packages. Moving the pin to
extras-no-deps.txt ensures --no-deps is used during install.
2026-04-14 20:16:30 +04:00
Datta Nimmaturi
4328d0b4f6
Fix num_items_in_batch GA for Gemma4 (#4998)
* Fix num_items_in_batch GA for Gemma4

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 09:01:10 -07:00
Daniel Han
7252410ccc
studio: stream export worker output into the export dialog (#4897)
* studio: stream export worker output into the export dialog

The Export Model dialog only showed a spinner on the "Exporting..."
button while the worker subprocess was doing the actual heavy lifting.
For Merged to 16bit and GGUF / Llama.cpp exports this meant several
minutes (or more, for large models) of opaque silence, with no way to
tell whether save_pretrained_merged, convert_hf_to_gguf.py, or
llama-quantize was making progress.

This adds a live terminal-style output panel inside the export dialog,
rendered just above the Cancel / Start Export buttons and scrollable
with auto-follow-tail. It shows stdout and stderr from both the worker
process itself and any child process it spawns (GGUF converter,
llama-quantize), coloured by stream.

Backend

- core/export/worker.py: new _setup_log_capture(resp_queue) installed
  before LogConfig.setup_logging. It saves the original stdout/stderr
  fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so
  every child process inherits the redirected fds), and spins up two
  daemon reader threads. Each thread reads bytes from a pipe, echoes
  them back to the original fd (so the server console keeps working),
  splits on \n and \r, and forwards each line to the resp queue as
  {"type":"log","stream":"stdout|stderr","line":...,"ts":...}.
  PYTHONUNBUFFERED=1 is set so nested Python converters flush
  immediately.

- core/export/orchestrator.py:
  - Thread-safe ring buffer (collections.deque, maxlen 4000) with a
    monotonically increasing seq counter. clear_logs(),
    get_logs_since(cursor), get_current_log_seq(), is_export_active().
  - _wait_response handles rtype == "log" by appending to the buffer
    and continuing the wait loop. Status messages are also surfaced as
    a "status" stream so users see high level progress alongside raw
    subprocess output.
  - load_checkpoint, _run_export, and cleanup_memory now wrap their
    bodies with the existing self._lock (previously unused), clear the
    log buffer at the start of each op, and flip _export_active in a
    try/finally so the SSE endpoint can detect idle.

- routes/export.py:
  - Wrapped every sync orchestrator call (load_checkpoint,
    cleanup_memory, export_merged_model, export_base_model,
    export_gguf, export_lora_adapter) in asyncio.to_thread so the
    FastAPI event loop stays free during long exports. Without this
    the new SSE endpoint could not be served concurrently with the
    blocking export POST.
  - New GET /api/export/logs/stream SSE endpoint. Honors
    Last-Event-ID and a since query param for reconnect, emits log /
    heartbeat / complete / error events, uses the id field to carry
    the log seq so clients can resume cleanly. On first connect
    without an explicit cursor it starts from the current seq so old
    lines from a previous run are not replayed.

Frontend

- features/export/api/export-api.ts: streamExportLogs() helper that
  authFetches the SSE endpoint and parses id / event / data fields
  manually (same pattern as streamTrainingProgress in train-api.ts).

- features/export/components/export-dialog.tsx:
  - Local useExportLogs(exporting) hook that opens the SSE stream on
    exporting transitions to true, accumulates up to 4000 lines in
    component state, and aborts on cleanup.
  - New scrollable output panel rendered above DialogFooter, only
    shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is
    a fast disk write with nothing to show). Dark terminal styling
    (bg-black/85, emerald text, rose for stderr, sky for status),
    max-height 14rem, auto-scrolls to the bottom on new output but
    stops following if the user scrolls up. A small streaming / idle
    indicator is shown next to the panel title.
  - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the
    output panel is visible so the logs have room to breathe.

Verified

- Python smoke test (tests/smoke_export_log_capture.py): spawns a
  real mp.get_context("spawn") process, installs _setup_log_capture,
  confirms that parent stdout prints, parent stderr prints, AND a
  child subprocess invoked via subprocess.run (both its stdout and
  stderr) are all captured in the resp queue. Passes.
- Orchestrator log helpers tested in isolation: _append_log,
  get_logs_since (with and without a cursor), clear_logs not
  resetting seq so reconnecting clients still progress. Passes.
- routes.export imports cleanly in the studio venv and /logs/stream
  shows up in router.routes.
- bun run build: tsc -b plus vite build, no TypeScript errors.

No existing export behavior is changed. If the subprocess, the SSE
endpoint, or the frontend hook fails, the export itself still runs to
completion the same way it did before, with or without logs visible.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* export dialog: trim bootstrap noise, scope logs per screen, show realpath

Several follow-ups to the live export log work:

1. Worker bootstrap noise (transformers venv activation, Unsloth banner,
   "Top GGUF/hub models" lists, vision detection, 2k-step weight load
   bar) is dropped from the export-dialog stream. A threading.Event
   gate in worker.py defaults closed and only opens once _handle_export
   actually starts; until then the reader thread still echoes lines to
   the saved console fd for debugging but does not push them onto the
   resp_queue. The orchestrator already spawns a fresh subprocess for
   every checkpoint load, so the gate is naturally reset between runs.

2. tqdm in non-tty mode defaults to a 10s mininterval, which makes
   multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5
   in the worker env so any tqdm-driven progress emits more often.

3. The dialog's useExportLogs hook now also clears its line buffer
   when exportMethod or open changes, so re-opening the dialog into a
   different action's screen no longer shows the previous action's
   saved output. A useElapsedSeconds tick + "Working Xs" badge in the
   log header gives users a visible sign that long single-step phases
   (cache copies, GGUF conversion) are still running when no new lines
   are arriving.

4. ExportBackend.export_{merged,base,gguf,lora} now return
   (success, message, output_path); the worker forwards output_path on
   each export_*_done response, the orchestrator's _run_export passes
   it to routes/export.py, which surfaces it via
   ExportOperationResponse.details.output_path. The dialog's Export
   Complete screen renders the resolved on-disk realpath under "Saved
   to" so users can find their exported model directly.

* fix(cli): unpack 3-tuple return from export backend

ExportOrchestrator.export_{merged,base,gguf,lora} now return
(success, message, output_path) so the studio dialog can show
the on-disk realpath. The CLI still unpacked 2 values, so every
`unsloth export --format ...` crashed with ValueError before
reporting completion. Update the four call sites and surface
output_path via a "Saved to:" echo.

* fix(studio): anchor export log SSE cursor at run start

The export dialog SSE defaulted its cursor to get_current_log_seq()
at connect time, so any line emitted between the POST that kicks
off the export and the client opening the stream was buffered with
seqs 1..k and then skipped (seq <= cursor). Long-running exports
looked silent during their first seconds.

Snapshot _log_seq into _run_start_seq inside clear_logs() and
expose it via get_run_start_seq(). The SSE default cursor now uses
that snapshot, so every line emitted since the current run began
is reachable regardless of when the client connects. Old runs
still can't leak in because their seqs are <= the snapshot.

* fix(studio): reconnect export log SSE on stream drop

useExportLogs launched streamExportLogs once per exporting
transition and recorded any drop in .catch(). Long GGUF exports
behind a proxy with an idle kill-timeout would silently lose the
stream for the rest of the run even though the backend already
supports Last-Event-ID resume. The "retry: 3000" directive emitted
by the backend is only meaningful to native EventSource; this
hook uses a manual fetch + ReadableStream parse so it had no
effect.

Wrap streamExportLogs in a retry loop that tracks lastSeq from
ExportLogEvent.id and passes it as since on reconnect. Backoff is
exponential with jitter, capped at 5s, reset on successful open.
The loop stops on explicit backend `complete` event or on effect
cleanup.

* fix(studio): register a second command so Typer keeps `export` as a subcommand

The CLI export unpacking tests wrap `unsloth_cli.commands.export.export`
in a fresh Typer app with a single registered command. Typer flattens a
single-command app into that command, so the test's
`runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading
`"export"` token as an unexpected extra positional argument -- every
parametrized case failed with:

    Got unexpected extra argument (.../out)

Register a harmless `noop` second command so Typer preserves subcommand
routing and the tests actually exercise the 3-tuple unpack path they
were written to guard.

Before: 4 failed
After:  4 passed

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: studio-install <studio@local.install>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 08:55:43 -07:00
Daniel Han
eca592effe
studio: show HF model download progress in training start overlay (#4894)
* studio: show HF model download progress in training start overlay

During the training setup phase, the overlay only displayed a static
"Loading model..." line while model weights were being downloaded from
Hugging Face. On slow connections this looked like the app had frozen.

This adds a small self-contained progress block inside the existing
TrainingStartOverlay that polls the existing
GET /api/models/download-progress endpoint and renders a Progress bar
with bytes downloaded, total bytes, and percent complete.

Notes:

- Frontend only change. No backend, worker, SSE, or runtime store edits.
- Reuses the existing getDownloadProgress client wrapper and the
  existing /api/models/download-progress endpoint that already scans
  the HF blob cache for completed and .incomplete files.
- selectedModel is read directly from useTrainingConfigStore inside the
  overlay, so no prop drilling and live-training-view.tsx is unchanged.
- Polling runs at 1500 ms and is gated on the HF repo regex
  (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses,
  so local paths and empty form state never hit the endpoint.
- Polling stops once progress reaches 1.0 so the bar can stay at 100
  until the overlay hides on the first training step.
- Network errors are silently swallowed, matching the chat side flow
  (the bar simply freezes at the last value).
- When downloadedBytes is 0 the block is hidden entirely, so cached
  models do not flash a progress bar.
- When the HF API cannot determine the total size, the block falls
  back to "X downloaded" with no percent and no bar.

Verified with bun run build (tsc -b plus vite build, no TypeScript
errors).

* training overlay: track dataset download + show on-disk realpath

Adds a dedicated "Downloading dataset..." section to the training-start
overlay alongside the existing model-weights one, so an HF dataset that
is downloading mid-startup is no longer mislabeled as model weights or
hidden entirely. The new GET /api/datasets/download-progress endpoint
mirrors /api/models/download-progress against the datasets-- prefix in
HF_HUB_CACHE.

Both endpoints now also return cache_path, the resolved on-disk
realpath of the snapshot directory (or the cache repo root if no
snapshot is materialized yet). The overlay surfaces this under each
download row so users can immediately see where the model and dataset
landed without digging through server logs.

The frontend's existing useModelDownloadProgress hook is generalized
to a single useHfDownloadProgress(repoId, fetcher) hook that the
model and dataset variants both delegate to, keeping polling, gating,
and completion semantics in one place.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Studio: Polish training start overlay download progress UI (#4957)

* studio: polish training start overlay download progress visuals

* Fix formatCachePath cross-platform support and redundant sizeLabel

- Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~
- Suppress sizeLabel when no byte info is available (cachePath-only state),
  since the "Preparing" badge already conveys the status

* Fix misleading status badge when download total is unknown

- Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot
  determine if the download is still in progress or already complete (happens
  when HF size metadata lookup fails for gated/private repos)
- Keep "Preparing" badge for the zero-bytes cachePath-only state
- Add Windows native path shortening to formatCachePath (C:\Users\<name>)

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>

---------

Co-authored-by: studio-install <studio@local.install>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
2026-04-14 08:54:01 -07:00
Daniel Han
44082cf88e
Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014)
* Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM

The chat settings sheet's ctx slider reads `max_context_length` from
`/api/inference/status` and renders

    Exceeds estimated VRAM capacity (N tokens). The model may use
    system RAM.

when the user drags the slider above that value. For models whose
weights fit on some GPU subset, `_max_context_length` was already set
to the binary-search cap and the warning fired correctly.

For models whose weights exceed 90% of every GPU subset's free memory
(e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe
loop never matched a subset, so `max_available_ctx` stayed at the
native context (e.g. 196608). The slider ran all the way to native
with no indication that any value above the 4096 spec default would
trigger `--fit on` and degrade performance.

Anchor `max_available_ctx` at `min(4096, native_context_length)` when
no subset fits, so the warning fires at the right threshold and the
user sees the correct safe-zone / warning-zone split:

    Before (MiniMax-M2.7 on 97 GB GPU):
      slider 0 .. 196608, warning threshold = 196608  (never fires)

    After:
      slider 0 .. 196608, warning threshold = 4096    (fires correctly)

No frontend changes required: `chat-settings-sheet.tsx` already
consumes `ggufMaxContextLength` (= status.max_context_length) as the
warning threshold and `ggufNativeContextLength` as the slider max.

Adds tests/test_llama_cpp_max_context_threshold.py covering
weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096
fallback case (don't lie about supported ctx), fittable-model
regressions (small / multi-GPU / tiny on huge GPU), and the
`max_context_length` property's fallback semantics.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:49 -07:00
Daniel Han
b2f80f210e
Studio: make GGUF disk-space preflight cache-aware (#5012)
* Studio: make GGUF disk-space preflight cache-aware

The pre-download disk check in LlamaCppBackend.load_model compared the
repo's total GGUF size against free disk without crediting bytes
already present in the Hugging Face cache. Re-loading a large cached
model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with
"Not enough disk space to download any variant" whenever free disk
was below the full weight footprint, even though nothing actually
needed to be downloaded.

Subtract bytes already on disk via try_to_load_from_cache before
comparing against free space. A partial blob (interrupted download) is
not credited, so a second attempt still allocates room to finish the
download. The log line now also surfaces how much is already cached.

Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the
fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk,
cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse
tempfiles keep the GB-scale scenarios cheap to simulate.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:37 -07:00
Daniel Han
767fa8cade
Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011)
* Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM

The load-time auto-fit in LlamaCppBackend.load_model had two issues for
models whose weights do not fit on any GPU subset (the common case for
large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.):

1. Auto mode (max_seq_length=0) left effective_ctx at the model's native
   context when no subset passed the 90% fit check. The UI slider then
   landed on e.g. 196608 for MiniMax-M2.7, far above anything usable.
   Default the auto-pick to 4096 so the UI starts at a sane value; the
   slider ceiling stays at the native context so the user can still
   opt in to longer contexts and receive the "might be slower" warning.

2. Explicit ctx was silently shrunk when weights fit but the requested
   KV overflowed the 90% budget. The shrink loop emitted -c <capped>
   -ngl -1 without informing the caller, so a user who had opted into
   a longer context via the UI never actually got it. Drop the shrink
   loop on the explicit path and emit -c <user_ctx> --fit on instead,
   letting llama-server flex -ngl (CPU layer offload).

Adds tests/test_llama_cpp_context_fit.py covering both paths, the
file-size-only fallback when KV metadata is missing, non-regression on
fittable auto-pick, and platform-agnostic input shape.

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-04-14 08:53:25 -07:00
TF-MTGE
a31c82a640
fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922)
* fix: increase wait response timeout to 900 sec instead of 300 sec. #4845

* Apply suggestion from @gemini-code-assist[bot]

good catch

Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>

---------

Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
2026-04-14 08:53:14 -07:00
Datta Nimmaturi
da78c6be71
[Studio] Install flash attn at setup time for linux (#4979)
* [Studio] Install flash attn at setup time for linux

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* cleanup changes

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Test cases

* wheel_utils: narrow url_exists exceptions and log at debug level

---------

Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 16:40:17 +04:00
Datta Nimmaturi
dccc0ebada
[Studio] Show non exported models in chat UI (#4892)
* Show non exported models in chat UI

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Distinguish b/w LoRa and full fine tune saves. Cleanup

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
2026-04-14 15:03:58 +04:00
Bharath Kumar Adinarayan
a50f61009b
fix(studio): default chart view to full training history (#5007)
* fix(studio): default chart view to full training history instead of last 80 steps

Fixes #5003

* chore: windowsize as null code comment

---------

Co-authored-by: imagineer99 <samleejackson0@gmail.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
2026-04-14 03:29:27 -07:00
Lee Jackson
bfa17330bd
Studio: Polish API key copy button and harden async clipboard fallback (#5006)
* fix: polish clipboard style and fix async clipboard path

* Use copyToClipboardAsync in CopyButton for Safari fallback

CopyButton was calling navigator.clipboard.writeText directly,
bypassing the execCommand fallback added in this same PR. Switch
to copyToClipboardAsync which tries execCommand first (Safari
user-gesture requirement) then falls back to the async clipboard API.

* Fix copyToClipboard sync contract regression and improve async path

- Restore copyToClipboard() to return only the execCommand result,
  preserving the boolean contract that 7 existing callers depend on
  to gate their "Copied!" UI state. The fire-and-forget async fallback
  was returning true before the promise resolved, causing false success.

- Add document.body null guard to copyWithExecCommand for SSR safety.

- Reorder copyToClipboardAsync to try the async Clipboard API first,
  avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs
  where execCommand always fails anyway.

* Restore queryCommandSupported guard and fix async catch path

- Restore the queryCommandSupported("copy") guard in copyToClipboard()
  to match the original contract exactly: when execCommand is entirely
  unsupported, fall through to fire-and-forget async clipboard write.

- Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText
  rejects, the user-gesture frame is gone, so execCommand will also fail.
  Return false from catch instead of falling through. The execCommand
  fallback at the bottom only runs when the Clipboard API is absent
  (still in user-gesture frame).

* Restore execCommand fallback in copyToClipboardAsync catch path

The catch block was returning false after clipboard API rejection,
based on the incorrect premise that the user-gesture frame is lost
after an await. Per the HTML spec, transient user activation IS
preserved through promise microtask chains. The real reason
execCommand fails in the Radix dialog is the focus trap intercepting
textarea.focus(), not gesture loss.

For non-dialog callers, execCommand can still succeed after a
clipboard rejection. Inside a Radix modal, execCommand returns
false harmlessly (focus trap blocks it).

* Harden textarea fallback for mobile and continue to async path on failure

---------

Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
2026-04-14 14:22:14 +04:00
Wasim Yousef Said
97eafd999e
studio: fix api-keys access + refresh (#5005)
* studio: fix api-keys access + refresh

* studio: guard v1 in spa fallback
2026-04-13 23:48:51 +04:00