mirror of
https://github.com/unslothai/unsloth
synced 2026-04-21 13:37:39 +00:00
1155 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
21e9a91a57
|
Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122)
* Studio: forward standard OpenAI tools / tool_choice on /v1/responses Mirrors the /v1/chat/completions client-side tool pass-through from #5099 so clients (OpenAI Codex CLI, OpenAI Python SDK, ...) that target the Responses API receive structured function_call output items instead of plain text with tool-call tokens leaking into content. - ResponsesRequest: type tools/tool_choice properly, add parallel_tool_calls; accept function_call and function_call_output input items for multi-turn - Translate flat Responses tool / tool_choice shape to the nested Chat Completions shape before forwarding to llama-server - _normalise_responses_input: map function_call_output -> role="tool", function_call -> assistant tool_calls (preserving call_id) - Non-streaming: map returned tool_calls -> top-level function_call output items keyed by call_id - Streaming: emit response.output_item.added (function_call), response.function_call_arguments.delta/.done, and response.output_item.done per tool call while keeping the text message at output_index 0 - Pytest coverage: tools/tool_choice translation, multi-turn input mapping, non-streaming tool_calls mapping, response round-trip * Studio: merge system messages and close inner stream on /v1/responses Fixes two issues surfacing when OpenAI Codex CLI drives /v1/responses against a GGUF with a strict chat template (gpt-oss harmony, Qwen3, ...). 1. "System message must be at the beginning" upstream errors Codex sends `instructions` AND a `role:"developer"` message in `input`, producing two separate system-role messages. Strict templates raise when a second system message exists or when one appears after a user turn. _normalise_responses_input now hoists all instructions / system / developer content into a single merged system message at the top of the Chat Completions message list. 2. "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" _responses_stream consumed the inner chat-completions body_iterator without an explicit aclose() in a finally block. On client disconnect (Codex frequently cancels mid-stream), Python 3.13 finalized the inner async generator on a different task, tripping anyio's cancel-scope check. Mirrored the same try/finally + aclose pattern used by the /v1/messages, /v1/chat/completions, and /v1/completions passthroughs. Tests: hoisting of instructions + developer, developer mid-conversation, multiple system messages in input, no-system passthrough. * Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses Two issues observed driving /v1/responses from OpenAI Codex CLI against a GGUF backend. 1. 422 on every turn after the first Codex replays prior assistant turns with `content:[{"type":"output_text","text":...,"annotations":[],"logprobs":[]}]` and carries forward `reasoning` items (o-series / gpt-5) between turns. Our `ResponsesContentPart` union only accepted input_text / input_image, and `ResponsesInputItem` only message / function_call / function_call_output, so Pydantic failed the whole list and FastAPI returned `"Input should be a valid string"` against the `str` branch of the outer union. - Add `ResponsesOutputTextPart` for assistant-replay content. - Add `ResponsesUnknownContentPart` and `ResponsesUnknownInputItem` as permissive catch-alls (drop during normalisation). - Wire an explicit `Discriminator` so dispatch is deterministic and the fallthrough reaches the catch-all instead of misreporting via the outer `Union[str, list[...]]`. - `_normalise_responses_input` now accepts output_text parts, flattens single-part assistant text to a plain string (keeps legacy chat templates happy), and silently drops reasoning / unknown items. 2. "async generator ignored GeneratorExit" / cross-task cancel scope `_responses_stream` awaited `openai_chat_completions` in the parent route-handler task, which opens the httpx client for the inner passthrough on *that* task. The outer `StreamingResponse` then iterates in a child task, so the asyncgen GC finalises the inner httpcore byte stream on the child task, tripping anyio's "Attempted to exit cancel scope in a different task". Move the `await` inside `event_generator` so the httpx lifecycle stays within the single streaming child task, and surface any HTTPException as a `response.failed` SSE frame. Tests: assistant output_text replay, reasoning-item tolerance, unknown content-part tolerance, end-to-end Codex-shape payload (developer + user + reasoning + function_call + function_call_output + assistant output_text + user), and single-part assistant flattening to plain string. * Studio: call llama-server directly from streaming /v1/responses The previous fix (running the inner await inside event_generator) was not enough. Wrapping the existing `openai_chat_completions` pass-through still stacks two async generators: when the outer generator is closed, the innermost `HTTP11ConnectionByteStream.__aiter__` in httpcore doesn't receive GeneratorExit before Python's asyncgen GC finalises it in a sibling task, tripping "Attempted to exit cancel scope in a different task" and "async generator ignored GeneratorExit" — the same Python 3.13 + httpcore 1.0.x interaction already seen in PRs #4956, #4981, #5099. Cure both pass-throughs had: a single same-task httpx lifecycle with explicit `aiter_lines().aclose()` BEFORE `resp.aclose()` / `client.aclose()` in the generator's finally block. Apply it at the Responses layer by dropping the wrapper entirely for GGUF: open httpx, consume `resp.aiter_lines()`, parse `chat.completion.chunk`, emit Responses SSE events, close everything in finally — all in the single StreamingResponse child task. Non-GGUF streaming is rejected with a 400 (wrapping the transformers backend would re-introduce the double-layer pattern and isn't a Codex-compatible path today anyway). Also surfaces upstream httpx.RequestError / non-200 as a `response.failed` SSE frame rather than a dropped stream now that the request is dispatched after SSE headers have gone out. * Studio: silence benign httpcore asyncgen GC warnings on Python 3.13 The streaming pass-throughs (/v1/chat/completions, /v1/messages, /v1/responses, /v1/completions) all use the proven #4981 / #5099 pattern — single-task httpx lifecycle with explicit aiter_lines().aclose() ahead of resp.aclose() / client.aclose() in the generator's finally block. That handles our own iterators correctly. The residual noise ("async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task") comes from an innermost HTTP11ConnectionByteStream.__aiter__ that httpcore creates internally inside its pool. We hold no reference to it, so we cannot aclose it ourselves. Python 3.13's asyncgen GC hook finalises it on the finaliser task, its aclose path enters an anyio CancelScope shield, and Python flags the cross-task exit. The response has already been delivered with a 200 by then — it is purely log noise, not a functional failure. Same interaction seen in modelcontextprotocol/python-sdk #831, agno #3556, chainlit #2361, langchain-mcp-adapters #254. Install a targeted sys.unraisablehook that swallows this specific tuple — RuntimeError mentioning "cancel scope" or "GeneratorExit" plus an object repr referencing HTTP11ConnectionByteStream — and defers to the default hook for every other unraisable. Idempotent; guarded by a sentinel attribute so repeated imports don't stack filters. |
||
|
|
c20959dbf4
|
Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX (#5089)
* Chatbox, scroll, and menu fixes
- Fixed chatbox auto-expand height for multi-line text on the compare page
- Fixed chatbox UI to be consistent across compare and new chat
- Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button
- Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly
- Added shutdown studio button to the menu for easier access
- Fixed pop-up menu width to match the user button width
(cherry picked from commit cd4e390dfa84fe311fae79a781b96cc0ef5970a9)
* fix: correct compare scroll viewport and clean up chat composer UI polish
* Dark theme refactor and sidebar/chat UI refinements
- Complete refactoring of dark theme
- Replaced square rounded-corner user profile image with a circular bordered one
- Replaced user profile icon with 'U' initial and renamed label from 'Studio' to 'User'
- Chat bubbles now have a pointy top-right edge
- Sidebar menu tab line color selection is now consistent across all menus
- Tab-selection color animation now also applies to recent chats
- Removed 'Compare' menu autoselect when a compare chat conversation is selected
- Fixed UI consistency in Compare to match New Chat
- Removed sidebar animation and tab line, replaced with rounded selection for consistency
- Further adjustments to sidebar UI
- Further adjustments to compare chat UI
* Fixed sidebar collapse/expand for recent chats and recent runs not being clickable
* Chatbox, scroll, and menu fixes
- Fixed chatbox auto-expand height for multi-line text on the compare page
- Fixed chatbox UI to be consistent across compare and new chat
- Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button
- Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly
- Added shutdown studio button to the menu for easier access
- Fixed pop-up menu width to match the user button width
* Sidebar, fonts, and chat UI refinements
- Replaced logo PNG with real font text for 'unsloth' and 'BETA' label
- Added Hellix font and applied it across menus and UI elements
- Lighter scrollbar in the sidebar compared to other areas of the app
- Adjusted chat font and chat bubble styling
- Adjusted app menu design to stay consistent with the sidebar
- Adjusted text style for 'New Chat' and repositioned content/chatbox
- Adjusted model selector and top area UI
- Fixed footer text from 'LLM's' to 'LLMs'
- Fixed active selection border color incorrectly appearing on page refresh and during general navigation
- Logo now defaults to 'New Chat' when clicked
* Sidebar, model selector, and mobile UI fixes
- Further adjustments to sidebar UI and logo
- Changed right bar icon
- Model selector adjustments
- Collapsed sidebar now matches the content area background
- Adjusted Hellix font spacing across pages
- Fixed sidebar icon overlap on mobile screens
* Adjust sidebar icons
* Adjust sidebar icons
* Fixed compare chat UI and scrolling issues
* Fixed inference settings icon behavior and context info positioning
- Fixed top right inference settings icon to move into sidepanel during expand/collapse, matching left sidebar behavior
- Adjusted context information element positioning
* Fix: textarea overflow in system prompt editor
* Code block redesign, font, and chat bubble adjustments
- Redesigned code block colors and theme
- Changed code block font to Fira Code
- Fixed scrollbar disappearing when expanding/collapsing tool calls in chats
- Adjusted chat bubble background color
* Fix chat bubble background color in dark theme
* fix: restore textarea auto-sizing and scope prompt editor sizing
* fix: add explicit textarea field sizing for prompt editor overflow
* fix: generate chat nonce on click instead of render
* fix: respect training lock on logo navigation
* Refactor compare page dual chat scrolling behavior
* Revert "Refactor compare page dual chat scrolling behavior"
This reverts commit
|
||
|
|
0a5c61ffcc
|
fix: prefer mainstream clipboard copy over deprecated one (#5109)
Fixes #5097 Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
d3215ce113
|
Studio: Show LoRA live logs and update GGUF quant options (#5058)
* export: update GGUF quant list and ordering * gguf: add Q2_K_L quantize flags for output and embeddings * export: add live console logs for LoRA export flow * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: stream q2_k_l quantize logs and include subprocess error details * fix: route Q2_K_L preset to q2_k ftype with q8_0 output+embeddings --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
9c8a079d97
|
Studio: Local profile customization in settings and sync sidebar identity (#5088)
* studio: add local profile customization in settings * studio: add local profile settings and sync sidebar identity * fix: adjust profile card margin * fix: move helper modules to utils and use single-letter avatar fallback * fix: keep profile icon visible on sidebar collapse * fix: sidebar account trigger labeling and profile reset prefs |
||
|
|
9954781d30
|
fix(studio/chat): cancel in-flight run when trashing a thread from sidebar (#5067)
Trashing a thread mid-stream used to delete the Dexie rows while the model kept generating, because the sidebar has no access to the @assistant-ui aui context. Expose per-thread cancelRun() through the chat runtime store and call it from deleteChatItem so trash behaves like Stop → Trash. Covers compare pairs by cancelling each paired thread. Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> |
||
|
|
ac2daf8b7a
|
Studio: forward standard OpenAI tools / tool_choice to llama-server (#5099)
* fix(studio): forward OpenAI tools/tool_choice to llama-server (#4999)
Studio's /v1/chat/completions silently stripped standard OpenAI `tools`
and `tool_choice` fields, so clients using standard function calling
(opencode, Claude Code, Cursor, Continue, ...) never got structured
tool_calls back. Adds a client-side pass-through path mirroring the
existing Anthropic /v1/messages flow: when `tools` is present without
Studio's `enable_tools` shorthand, the request is forwarded to
llama-server verbatim so the client sees native id, finish_reason
("tool_calls"), delta.tool_calls, and accurate usage tokens.
Also wires Anthropic tool_choice forwarding: /v1/messages previously
accepted tool_choice on the request model but silently dropped it with
a warning. Translate the four Anthropic shapes to OpenAI format and
forward them so agentic clients can actually enforce tool use.
- ChatCompletionRequest: add tools, tool_choice, stop; extra="allow"
- ChatMessage: accept role="tool", optional tool_call_id / tool_calls /
name; content is now optional (assistant with only tool_calls)
- routes/inference.py: _openai_passthrough_stream /
_openai_passthrough_non_streaming helpers, routing branch in
openai_chat_completions, vision+tools via content-parts injection
- _build_passthrough_payload: tool_choice parameter (default "auto")
- anthropic_compat: anthropic_tool_choice_to_openai() translator
- tests/test_openai_tool_passthrough.py: Pydantic + translator unit tests
- tests/test_studio_api.py: 5 new E2E tests (non-stream, stream,
multi-turn, OpenAI SDK, Anthropic tool_choice=any regression)
* fix(studio): surface httpx transport errors from OpenAI passthrough
When the managed llama-server subprocess crashes mid-request, the
async pass-through helpers in routes/inference.py used to return a
bare 500 (non-streaming) or an "An internal error occurred" SSE chunk
(streaming) because _friendly_error only recognized the sync path's
"Lost connection to llama-server" substring -- httpx transport
failures (ConnectError / ReadError / RemoteProtocolError /
ReadTimeout) stringify differently and fell through to the generic
case.
- _friendly_error: map any httpx.RequestError subclass to the same
"Lost connection to the model server" message the sync chat path
emits. Placed before the substring heuristics so the streaming path
automatically picks it up via its existing except Exception catch.
- _openai_passthrough_non_streaming: wrap the httpx.AsyncClient.post
in a try/except httpx.RequestError and re-raise as HTTPException
502 with the friendly detail.
- tests/test_openai_tool_passthrough.py: new TestFriendlyErrorHttpx
class pinning the mapping for ConnectError, ReadError,
RemoteProtocolError, ReadTimeout, and confirming non-httpx paths
(context-size heuristic, generic fallback) are unchanged.
* fix(studio): close aiter_bytes/aiter_lines explicitly in passthroughs
The httpcore asyncgen cleanup fix in
|
||
|
|
0b57884120
|
Add Qwen3.6 inference defaults for Studio (#5065)
* Add Qwen3.6 inference defaults for Studio Add qwen3.6 family entry to inference_defaults.json with the recommended sampling parameters from Qwen's documentation: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0. Without this, Qwen3.6 models fall through to the generic qwen3 pattern which uses different defaults (temperature=0.6, top_p=0.95, no presence_penalty). * Add Qwen3.6-35B-A3B-GGUF to default model lists * Add Qwen3.5/3.6 presence_penalty to thinking toggle and small-model disable logic - Thinking toggle (on-load + button click) now sets presencePenalty: 1.5 for Qwen3.5 and Qwen3.6 models (both thinking-ON and thinking-OFF states) - Small-model thinking-disable check (<9B defaults to no-thinking) extended from Qwen3.5-only to also cover Qwen3.6, in all 3 locations: frontend on-load, frontend refresh, backend llama_cpp.py |
||
|
|
ee86530e55
|
chore: switch helper and no-cache fallback to Gemma (#5066) | ||
|
|
bc9ddb3af6
|
Fix onboarding followups (#5064)
* Fix onboarding followups * Rename sidebar studio to train |
||
|
|
7ef65bd2e5
|
Chat first onboarding (#5063)
* auth: default to chat * settings: relaunch onboarding * onboarding: return to launch page * studio: stop auto guided tour * ui: soften global radius * cleanup: rename onboarding exit prop * fix onboarding redirect safety * Show real Unsloth version in settings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
f4422b0a62
|
change torchcodec version to 0.10.0 in extra-no-deps (#5043)
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
b01e9af124
|
feat(studio): replace navbar with collapsible sidebar (#4936)
* feat(studio): replace navbar navigation with collapsible sidebar Add an app-wide sidebar with hover-expand and pin-to-dock behavior. Navigation items (Studio, Recipes, Export, Chat) move from the center pill navbar to the sidebar. Chat threads and recipes render as collapsible sub-lists. Navbar simplified to logo + update + close. - Extend SidebarProvider with pinned/hovered state model - New AppSidebar with animated active indicator, sloth profile menu, theme toggle, guided tour, back/forward navigation - Chat page refactored to URL-driven view state via search params - Extract reusable hooks for chat thread and recipe sidebar data - Guard startViewTransition for browser compatibility - Wrap chat deletions in Dexie transaction for data integrity * feat(studio): move logo to sidebar and make navbar overlay - Sidebar is now full-height with logo in SidebarHeader - Collapsed sidebar shows sticker.png, expanded shows full logo - Navbar is absolute-positioned overlay (no layout space) - Main content extends to top, aligning with navbar controls * feat(studio): full-height sidebar with recents, edge-to-edge nav buttons - Sidebar outside max-w-7xl, pinned to left edge - Remove sidebar rounding, menu buttons rounded-md - Nav buttons flush to sidebar edges with no left rounding - Replace collapsible recipes/chat with flat nav items - Add Recents section with chat history (1 item when not on chat, full on chat) - New Chat as first nav item with PencilEdit02Icon - Cursor pointer on all sidebar buttons - Navbar temporarily hidden for screenshots * fix(studio): fix chat scroll, action bar hover, collapsible recents - Fix sticky composer by removing `relative` override on viewport footer - Action bar buttons only show on hover (autohide=always) - Remove floating border/shadow from action bar - Add scroll space above composer for last message actions - Back/forward buttons use router history (stay in-app) - Recents section collapsible with chevron on chat route - Set html/body/#root height for proper h-full chain * fix(studio): address review feedback, clean up unused code - Unhide navbar (was left hidden from screenshot) - Remove unused imports: SidebarMenuSub*, BubbleChatIcon, ColumnInsertIcon - Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen - Include compare query id in active sidebar selection - Use store type for contextUsage instead of inline type - Simplify noop in sidebar.tsx - Remove empty className prop * feat(studio): add mobile sidebar, recent runs section, and misc UX fixes * feat(studio): scaffold settings feature module with dialog store * feat(studio): add tri-state theme store for settings * feat(chat): add clear-all-chats and export-chat-history utils * feat(studio): add settings dialog shell with tab rail * feat(studio): add appearance tab with theme and sidebar pin * feat(studio): add settings general tab with hf token, auto-title, reset prefs * feat(studio): add settings chat tab with export and clear * feat(studio): add api keys tab with list and revoke flow * feat(studio): add create-key form and reveal dialog * feat(studio): add usage examples panel to api keys tab * feat(studio): add settings about tab with update and shutdown * feat(studio): add settings dropdown item and cmd-comma shortcut * feat(studio): remove legacy api-keys route and chat-sheet preference rows * fix(studio): settings dialog a11y + polish pass * feat(studio): inline api key reveal card replacing nested dialog * fix(studio): hide revoked keys from settings list * refactor(studio): strip navbar and hoist training unload guard * feat(studio): explicit sidebar toggle, remove hover-open and pin icons * fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle * fix(studio): address code review findings for settings dialog * feat(studio): collapsible navigate group with standalone new-chat and compare * fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare * fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible * feat(studio): add compact logo assets for sidebar header * Fixed sidebar design * fix(studio): sidebar delete icon hover contrast and sizing * feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio) * feat(studio): add chat search store * feat(studio): add chat search index hook with snapshot-on-open * feat(studio): add chat search command dialog with global shortcut * feat(studio): wire chat search into sidebar * fix(studio): trim hf token on save, add show/hide toggle, commit on close * revert(studio): restore original sidebar/border colors, brighten sidebar * feat(studio): forward overlayClassName through CommandDialog * fix(studio): wrap search dialog in Command context, redesign as flat 635px card * fix(studio): reserve right padding on recent items so delete icon stops overlapping title * fix(studio): skip hf token unmount-commit during reset-prefs reload * chore(studio): drop unused icon import and unreachable runs navigate branch * fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text * fix(studio): keep CommandEmpty in tree so empty state renders correctly * fix(studio): cap system prompt and chat template textareas so they scroll instead of growing * fix(studio): attach chat-compare tour anchor to sidebar compare button * fix(studio): persist system theme explicitly so next-themes does not clobber on reload * fix(studio): auto-switch to history tab when selecting a recent run from sidebar * UI overhaul: chatbox, scrollbar, sidebar, and compare view UI Changes: - Redesigned the Compare UI with general cleanup - Redesigned the Chatbox UI - Reduced the width of the user chat bubble for improved readability - Narrowed the user chat box across the content page - Adjusted thinking-box text color to be slightly darker - Removed faded text effect from chat messages - Removed faded text effect from the thinking box - Added a small LLM chat safety note at the bottom of the chatbox - Restyled the scrollbar Layout & Behavior: - Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover - Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout - Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior - Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling - Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox Fixes: - Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started) - Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side * Fix user message layout in thread component * swap code icon * fix compare layout * fix compare pane flex * Sidebar improvements and fixes - Added scrolling support to the sidebar so menus and recent chats no longer get hidden - Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export - Recent chat is now deselected when selecting other navigations - Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely - Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar - Reduced sidebar width on mobile to around 2/3 of the screen (was too wide) - Made the close-sidebar hover tooltip consistent with the rest of the design - Removed sidebar collapse/expand animation - Small adjustment to chat width * Fix route scrolling, polling, and theme sync issues * Fix Studio page scrolling --------- Co-authored-by: sneakr <hauzin@hotmail.com> |
||
|
|
05ec0f110b
|
Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050)
* Studio: Ollama support, recommended folders, Custom Folders UX polish
Backend:
- Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/*
and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing
at the content-addressable blobs, so detect_gguf_model and llama-server
-m work unchanged for Ollama models
- Filter entries under .studio_links from the generic models/hf/lmstudio
scanners to avoid duplicate rows and leaked internal paths in the UI
- New GET /api/models/recommended-folders endpoint returning LM Studio
and Ollama model directories that currently exist on the machine
(OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy
LM Studio cache), used by the Custom Folders quick-add chips
- detect_gguf_model now uses os.path.abspath instead of Path.resolve so
the readable symlink name is preserved as display_name (e.g.
qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...)
- llama-server failure with a path under .studio_links or .cache/ollama
surfaces a friendlier message ("Some Ollama models do not work with
llama.cpp. Try a different model, or use this model directly through
Ollama instead.") instead of the generic validation error
Frontend:
- ListLabel supports an optional leading icon and collapse toggle; used
for Downloaded (download icon), Custom Folders (folder icon), and
Recommended (star icon)
- Custom Folders header gets folder icon on the left, and +, search,
and chevron buttons on the right; chevron uses ml-auto so it aligns
with the Downloaded and Recommended chevrons
- New recommended folder chips render below the registered scan folders
when there are unregistered well-known paths; one click adds them as
a scan folder
- Custom folder rows that are direct .gguf files (Ollama symlinks) load
immediately via onSelect instead of opening the GGUF variant expander
(which is for repos containing multiple quants, not single files)
- When loading a direct .gguf file path, send max_seq_length = 0 so the
backend uses the model's native context instead of the 4096 chat
default (qwen2.5:0.5b now loads at 32768 instead of 4096)
- New listRecommendedFolders() helper on the chat API
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address review: log silent exceptions and support read-only Ollama dirs
Replace silent except blocks in _scan_ollama_dir and the
recommended-folders endpoint with narrower exception types plus debug
or warning logs, so failures are diagnosable without hiding signal.
Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed
namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links)
when the Ollama models directory is read-only. Common for system installs
at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models
where the Studio process has read but not write access. Previously the
scanner returned an empty list in that case and Ollama models would
silently not appear.
The fallback preserves the .gguf suffix on symlink names so
detect_gguf_model keeps recognising them. The prior "raw sha256 blob
path" fallback would have missed the suffix check and failed to load.
* Address review: detect mmproj next to symlink target for vision GGUFs
Codex P1 on model_config.py:1012: when detect_gguf_model returns the
symlink path (to preserve readable display names), detect_mmproj_file
searched the symlink's parent directory instead of the target's. For
vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight
file is symlinked but any mmproj sidecar lives next to the real blob
-- mmproj was no longer detected, so the model was misclassified as
text-only and llama-server would start without --mmproj.
detect_mmproj_file now adds the resolved target's parent to the scan
order when path is a symlink. Direct (non-symlink) .gguf paths are
unchanged, so LM Studio and HF cache layouts keep working exactly as
before. Verified with a fake layout reproducing the bug plus a
regression check on a non-symlink LM Studio model.
* Address review: support all Ollama namespaces and vision projector layers
- Iterate over all directories under registry.ollama.ai/ instead of
hardcoding the "library" namespace. Custom namespaces like
"mradermacher/llama3" now get scanned and include the namespace
prefix in display names, model IDs, and symlink names to avoid
collisions.
- Create companion -mmproj.gguf symlinks for Ollama vision models
that have an "application/vnd.ollama.image.projector" layer, so
detect_mmproj_file can find the projector alongside the model.
- Extract symlink creation into _make_symlink helper to reduce
duplication between model and projector paths.
* Address review: move imports to top level and add scan limit
- Move hashlib and json imports to the top of the file (PEP 8).
- Remove inline `import json as _json` and `import hashlib` from
function bodies, use the top-level imports directly.
- Add `limit` parameter to `_scan_ollama_dir()` with early exit
when the threshold is reached.
- Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops
traversing once enough models are found.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address review: Windows fallback, all registry hosts, collision safety
_make_link (formerly _make_symlink):
- Falls back to os.link() hardlink when symlink_to() fails (Windows
without Developer Mode), then to shutil.copy2 as last resort
- Uses atomic os.replace via tmp file to avoid race window where the
.gguf path is missing during rescan
Scanner now handles all Ollama registry layouts:
- Uses rglob over manifests/ instead of hardcoding registry.ollama.ai
- Discovers hf.co/org/repo:tag and any other host, not just library/
- Filenames include a stable sha1 hash of the manifest path to prevent
collisions between models that normalize to the same stem
Per-model subdirectories under .studio_links/:
- Each model's links live in their own hash-keyed subdirectory
- detect_mmproj_file only sees the projector for that specific model,
not siblings from other Ollama models
Friendly Ollama error detection:
- Now also matches ollama_links/ (the read-only fallback cache path)
and model_identifier starting with "ollama/"
Recommended folders:
- Added os.access(R_OK | X_OK) check so unreadable system directories
like /var/lib/ollama/.ollama/models are not advertised as chips
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address review: filter ollama_links from generic scanners
The generic scanners (models_dir, hf_cache, lmstudio) already filter
out .studio_links to avoid duplicate Ollama entries, but missed the
ollama_links fallback cache directory used for read-only Ollama
installs. Add it to the filter.
* Address review: idempotent link creation and path-component filter
_make_link:
- Skip recreation when a valid link/copy already exists (samefile or
matching size check). Prevents blocking the model-list API with
multi-GB copies on repeated scans.
- Use uuid4 instead of os.getpid() for tmp file names to avoid race
conditions from concurrent scans.
- Log cleanup errors instead of silently swallowing them.
Path filter:
- Use os.sep-bounded checks instead of bare substring match to avoid
false positives on paths like "my.studio_links.backup/model.gguf".
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Address review: drop copy fallback, targeted glob, robust path filter
_make_link:
- Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync
API request would block the backend. Log a warning and skip the
model when both symlink and hardlink fail.
Scanner:
- Replace rglob("*") with targeted glob patterns (*/*/* and */*/*/*)
to avoid traversing unrelated subdirectories in large custom folders.
Path filter:
- Use Path.parts membership check instead of os.sep substring matching
for robustness across platforms.
Scan limit:
- Skip _scan_ollama_dir when _generic already fills the per-folder cap.
* Address review: sha256, top-level uuid import, Path.absolute()
- Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency.
- Move uuid import to the top of the file instead of inside _make_link.
- Replace os.path.abspath with Path.absolute() in detect_gguf_model
to match the pathlib style used throughout the codebase.
* Address review: fix stale comments (sha1, rglob, copy fallback)
Update three docstrings/comments that still referenced the old
implementation after recent changes:
- sha1 comment now says "not a security boundary" (no hash name)
- "rglob" -> "targeted glob patterns"
- "file copies as a last resort" -> removed (copy fallback was dropped)
* Address review: fix stale links, support all manifest depths, scope error
_make_link:
- Drop size-based idempotency shortcut that kept stale links after
ollama pull updates a tag to a same-sized blob. Only samefile()
is used now -- if the link doesn't point at the exact same inode,
it gets replaced.
Scanner:
- Revert targeted glob back to rglob so deeper OCI-style repo names
(5+ path segments) are not silently skipped.
Ollama error:
- Only show "Some Ollama models do not work with llama.cpp" when the
server output contains GGUF compatibility hints (key not found,
unknown architecture, failed to load). Unrelated failures like
OOM or missing binaries now show the generic error instead of
being misdiagnosed.
---------
Co-authored-by: Daniel Han <info@unsloth.ai>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: danielhanchen <michaelhan2050@gmail.com>
|
||
|
|
6e87bade25 |
Trim verbose comments in PATH helpers
Reduce inline comments from ~160 lines to ~25 across both files. Keep one-line summaries of the "why"; drop multi-paragraph rationale blocks that repeated information already captured in commit messages and PR discussion. |
||
|
|
ec32ce2e82
|
fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961)
* fix: replacing SetEnvironmentVariable with direct registry API
* apply reviews
* Use CreateSubKey for HKCU\Environment
* Store PATH backup under HKCU\Software\Unsloth
* Fix $backupKey registry handle leak in PATH backup block
Wrap $backupKey operations in try/finally so the handle is closed even
if GetValue or SetValue throws. The Add-ToUserPath helper already uses
this pattern for its registry key -- the backup block was the only
place missing it.
* Isolate WM_SETTINGCHANGE broadcast from PATH write error handling
Wrap the broadcast dummy-variable calls in their own try/catch so a
broadcast failure does not mask a successful registry PATH write.
Previously, if SetEnvironmentVariable threw after SetValue already
committed the new PATH, Add-ToUserPath would return $false and the
caller would skip Refresh-SessionPath.
* PATH helper polish: venv precedence, quoted entries, raw/expanded dedup
Three small follow-ups surfaced by a 10-reviewer pass against the rebased
PR head. None fix a regression vs main; each strictly improves the new
helpers.
Refresh-SessionPath / Refresh-Environment:
- Move $env:Path to the front of the merge so an activated venv keeps
precedence over machine/user PATH after a refresh. Pre-PR dropped
process-only entries entirely; post-PR kept them but at the back.
- Dedup on both raw and expanded forms so %USERPROFILE%\foo and the
already-expanded C:\Users\me\foo do not both survive.
Add-ToUserPath:
- Trim whitespace and surrounding double-quotes from each compared entry
so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate
against an unquoted directory of the same path.
* Back up User PATH inside Add-ToUserPath, before first mutation
Previously only studio/setup.ps1 took a one-time PATH backup, at script
top (line ~547). install.ps1 (the irm | iex entry point) had no backup,
so users who installed via that path had no recovery surface if anything
clobbered their PATH. The PR description's "one-time backup before any
modifications" promise only held for the studio installer flow.
Move the backup into Add-ToUserPath itself: just before the first actual
SetValue mutation, write the pristine raw PATH to
HKCU\Software\Unsloth\PathBackup if no backup already exists. This:
- Covers both entry points (install.ps1 and studio/setup.ps1).
- Captures the TRUE pristine PATH even when install.ps1 runs first and
studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1
would otherwise see an already-modified PATH).
- Is idempotent: once a backup exists, subsequent calls preserve it.
- Skips when nothing would mutate (dedup match) or PATH is empty.
The script-top backup in studio/setup.ps1 is kept for defense in depth.
* Refresh PATH: venv-aware merge order
Reconcile two competing concerns about Refresh-SessionPath /
Refresh-Environment surfaced by separate review rounds:
- venv at the back -> activated venv loses precedence to system Python
- process at the front -> stale shims (old node, old python, etc.)
still on $env:Path can beat a freshly installed tool
New merge order:
1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set
2. Machine PATH freshly read from registry
3. User PATH freshly read from registry
4. Current $env:Path as fallback
This way an explicitly-activated venv keeps priority while a tool the
script just installed wins over any stale entry that was already on
the inherited shell PATH. When no venv is active, fresh registry
entries take precedence as expected.
* Append to User PATH by default, close $envKey in finally
Add-ToUserPath gains a -Position Append|Prepend parameter defaulting to
Append so installing unsloth no longer prepends the bundled venv Scripts
directory ahead of the user's existing python / pip on new shells. The
four current call sites (install.ps1 launcher, studio/setup.ps1 CMake,
nvcc, Python user Scripts) all take the Append default because each one
that needs in-session precedence already does an inline $env:Path prepend
independently. This matches rustup / cargo / nvm / pyenv / uv behavior.
Also wrap the script-top $envKey.GetValue in a try/finally so the
registry handle is released even if the read throws. Matches the pattern
already used for $backupKey five lines below.
* Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended
The previous commit switched Add-ToUserPath to append by default so that
installing unsloth would not silently hijack the user's system python /
pip. That was correct for the venv Scripts dir (which contains python.exe
and pip.exe alongside unsloth.exe), but wrong for the three studio/setup
call sites. Those persist cmake, the driver-compatible nvcc, and the
Python user Scripts dir for future shells, and in all three cases an
older tool already earlier in the user PATH would keep winning after the
install finished. The nvcc case is especially load-bearing: setup selects
a driver-compatible CUDA toolkit, then llama.cpp builds against whatever
wins PATH resolution, so a stale older nvcc produces broken builds.
Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites
(cmake at line 754, nvcc bin at line 1025, Python user Scripts at line
1191). None of those directories holds python.exe, so prepending them
does not re-introduce the original hijack problem. Leave the install.ps1
venv Scripts call on the default Append with a comment explaining why.
* Symmetric dedup, Prepend reorders duplicates, unsloth shim dir
Address three separate findings surfaced by review:
1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded
registry entries via ExpandEnvironmentVariables but did NOT expand the
new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was
already in PATH produced a duplicate. Expand both sides so the check
is symmetric.
2. -Position Prepend no-op on existing duplicates: the dedup loop
returned $false as soon as it saw a match, regardless of position.
That left a late-position duplicate in place instead of moving it to
the front, so "prepend the newly selected cmake/nvcc" did not always
beat an older copy earlier in PATH. Partition entries into kept and
dropped lists, then reinsert a single copy at the requested position.
Append still returns $false on any match so user-curated orderings
are not reshuffled. Prepend also returns $false when the only copy
is already at position 0 so we preserve the user's casing.
3. Stop adding the venv Scripts dir to User PATH entirely. That dir
holds python.exe and pip.exe alongside unsloth.exe, so neither
Prepend nor Append worked: prepend hijacked the user's system python
and pip, append made the freshly-installed unsloth.exe lose to any
older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add
with a dedicated shim directory that contains only unsloth.cmd, and
prepend that dir. The shim calls the venv's unsloth.exe by absolute
path so future pip upgrades inside the venv propagate automatically.
* Shim via hardlink, Append user Scripts, drop venv sysconfig fallback
Three follow-ups to the
|
||
|
|
f0d03655e8
|
Studio: add folder browser modal for Custom Folders (#5035)
* Studio: add folder browser modal for Custom Folders
The Custom Folders row in the model picker currently only accepts a
typed path. On a remote-served Studio (Colab, shared workstation) that
means the user has to guess or paste the exact server-side absolute
path. A native browser folder picker can't solve this: HTML
`<input type="file" webkitdirectory>` hides the absolute path for
security, and the File System Access API (Chrome/Edge only) returns
handles rather than strings, neither of which the server can act on.
This PR adds a small in-app directory browser that lists paths on the
server and hands the chosen string back to the existing
`POST /api/models/scan-folders` flow.
## Backend
* New endpoint `GET /api/models/browse-folders`:
* `path` query param (expands `~`, accepts relative or absolute; empty
defaults to the user's home directory).
* `show_hidden` boolean to include dotfiles/dotdirs.
* Returns `{current, parent, entries[], suggestions[]}`. `parent` is
null at the filesystem root.
* Immediate subdirectories only (no recursion); files are never
returned.
* `entries[].has_models` is a cheap hint: the directory looks like it
holds models if it is named `models--*` (HF hub cache layout) or
one of the first 64 children is a .gguf/.safetensors/config.json/
adapter_config.json or another `models--*` subfolder.
* Sort order: model-bearing dirs, then plain, then hidden; case-
insensitive alphabetical within each bucket.
* Suggestions auto-populate from HOME, the HF cache root, and any
already-registered scan folders, deduplicated.
* Error surface: 404 for missing path, 400 for non-directory, 403 on
permission errors. Auth-required like the other models routes.
* New Pydantic schemas `BrowseEntry` and `BrowseFoldersResponse` in
`studio/backend/models/models.py`.
## Frontend
* New `FolderBrowser` component
(`studio/frontend/src/components/assistant-ui/model-selector/folder-browser.tsx`)
using the existing `Dialog` primitive. Features:
* Clickable breadcrumb with a `..` row for parent navigation.
* Quick-pick chips for the server-provided suggestions.
* `Show hidden` checkbox.
* In-flight fetch cancellation via AbortController so rapid
navigation doesn't flash stale results.
* Badges model-bearing directories inline.
* `chat-api.ts` gains `browseFolders(path?, showHidden?)` and matching
types.
* `pickers.tsx` adds a folder-magnifier icon next to the existing `Add`
button. Opening the browser seeds it with whatever the user has
already typed; confirming fills the text input, leaving the existing
validation and save flow unchanged.
## What it does NOT change
* The existing text-input flow still works; the browser is additive.
* No new permissions or escalation; the endpoint reads only directories
the server process is already allowed to read.
* No model scanning or filesystem mutation happens from the browser
itself -- it just returns basenames for render.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Studio: cap folder-browser entries and expose truncated flag
Pointing the folder browser at a huge directory (``/usr/lib``,
``/proc``, or a synthetic tree with thousands of subfolders) previously
walked the whole listing and stat-probed every child via
``_looks_like_model_dir``. That is both a DoS shape for the server
process and a large-payload surprise for the client.
Introduce a hard cap of 2000 subdirectory entries and a
``truncated: bool`` field on the response. The frontend renders a small
hint below the list when it fires, prompting the user to narrow the
path. Below-cap directories are unchanged.
Verified end-to-end against the live backend with a synthetic tree of
2050 directories: response lands at 2000 entries, ``truncated=true``,
listing finishes in sub-second time (versus tens of seconds if we were
stat-storming).
* Studio: suggest LM Studio / Ollama dirs + 2-level model probe
Three improvements to the folder-browser, driven by actually dropping
an LM Studio-style install (publisher/model/weights.gguf) into the
sandbox and walking the UX:
## 1. Quick-pick chips for other local-LLM tools
`well_known_model_dirs()` (new) returns paths commonly used by
adjacent tools. Only paths that exist are returned so the UI never
shows dead chips.
* LM Studio current + legacy roots + user-configured
`downloadsFolder` from its `settings.json` (reuses the existing
`lmstudio_model_dirs()` helper).
* Ollama: `$OLLAMA_MODELS` env override, then `~/.ollama/models`,
`/usr/share/ollama/.ollama/models`, and `/var/lib/ollama/.ollama/models`
(the systemd-service install path surfaced in the upstream "where is
everything?" issue).
* Generic user-choice locations: `~/models`, `~/Models`.
Dedup is stable across all sources.
## 2. Two-level model-bearing probe
LM Studio and Ollama both use `root/publisher/model/weights.gguf`.
The previous `has_models` heuristic only probed one level, so the
publisher dir (whose immediate children are model dirs, not weight
files) was always marked as non-model-bearing. Pulled the direct-
signal logic into `_has_direct_model_signal` and added a grandchild
probe so the classic layout is now recognised.
Still O(PROBE^2) worst-case, still returns immediately for
`models--*` names (HF cache layout) and for any direct weight file.
## 3. model_files_here hint on response body
A leaf model dir (just GGUFs, no subdirs) previously rendered as
`(empty directory)` in the modal, confusing users into thinking the
folder wasn't scannable. Added a `model_files_here` count on the
response (capped at 200) and a small hint row in the modal: `N model
files in this folder. Click "Use this folder" to scan it.`
## Verification
Simulated an LM Studio install by downloading the real 84 MB
`unsloth/SmolLM2-135M-Instruct-Q2_K.gguf` into
`~/.lmstudio/models/unsloth/SmolLM2-135M-Instruct-GGUF/`. Confirmed
end-to-end:
* Home listing suggests `~/.lmstudio/models` as a chip.
* Browsing `~/.lmstudio/models` flags `unsloth` (publisher) as
`has_models=true` via the 2-level probe.
* Browsing the publisher flags `SmolLM2-135M-Instruct-GGUF` (model
dir) as `has_models=true`.
* Browsing the model dir returns empty entries but
`model_files_here=1`, and the frontend renders a hint telling the
user it is a valid target.
* Studio: one-click scan-folder add + prominent remove + plain search icon
Three small Custom Folders UX fixes after real-use walkthrough:
* **One-click add from the folder browser**. Confirming `Use this
folder` now submits the path directly to
`POST /api/models/scan-folders` instead of just populating the text
input. `handleAddFolder` takes an optional explicit path so the
submit lands in the same tick as `setFolderInput`, avoiding a
state-flush race. The typed-path + `Add` button flow is unchanged.
* **Prominent remove X on scan folders**. The per-folder delete
button was `text-muted-foreground/40` and hidden entirely on
desktop until hovered (`md:opacity-0 md:group-hover:opacity-100`).
Dropped the hover-only cloak, bumped color to `text-foreground/70`,
added a red hover/focus background, and sized the icon up from
`size-2.5` to `size-3`. Always visible on every viewport.
* **Plain search icon for the Browse button**. `FolderSearchIcon`
replaced with `Search01Icon` so it reads as a simple "find a
folder" action alongside the existing `Add01Icon`.
* Studio: align Custom Folders + and X buttons on the same right edge
The Custom Folders header used `px-2.5` with a `p-0.5` icon button,
while each folder row used `px-3` with a `p-1` button. That put the
X icon 4px further from the right edge than the +. Normalised both
rows to `px-2.5` with `p-1` so the two icons share a column.
* Studio: empty-state button opens the folder browser directly
The first-run empty state for Custom Folders was a text link reading
"+ Add a folder to scan for local models" whose click toggled the
text input. That's the wrong default: a user hitting the empty state
usually doesn't know what absolute path to type, which is exactly
what the folder browser is for.
* Reword to "Browse for a models folder" with a search-icon
affordance so the label matches what the click does.
* Click opens the folder browser modal directly. The typed-path +
Add button flow is still available via the + icon in the
section header, so users who know their path keep that option.
* Slightly bump the muted foreground opacity (70 -> hover:foreground)
so the button reads as a primary empty-state action rather than a
throwaway hint.
* Studio: Custom Folders header gets a dedicated search + add button pair
The Custom Folders section header had a single toggle button that
flipped between + and X. That put the folder-browser entry point
behind the separate empty-state link. Cleaner layout: two buttons in
the header, search first, then add.
* Search icon (left) opens the folder browser modal directly.
* Plus icon (right) toggles the text-path input (unchanged).
* The first-run empty-state link is removed -- the two header icons
cover both flows on every state.
Both buttons share the same padding / icon size so they line up with
each other and with the per-folder remove X.
* Studio: sandbox folder browser + bound caps + UX recoveries
PR review fixes for the Custom Folders folder browser. Closes the
high-severity CodeQL path-traversal alert and addresses the codex /
gemini P2 findings.
Backend (studio/backend/routes/models.py):
* New _build_browse_allowlist + _is_path_inside_allowlist sandbox.
browse_folders now refuses any target that doesn't resolve under
HOME, HF cache, Studio dirs, registered scan folders, or the
well-known third-party model dirs. realpath() is used so symlink
traversal cannot escape the sandbox. Also gates the parent crumb
so the up-row hides instead of 403'ing.
* _BROWSE_ENTRY_CAP now bounds *visited* iterdir entries, not
*appended* entries. Dirs full of files (or hidden subdirs when
show_hidden is False) used to defeat the cap.
* _count_model_files gets the same visited-count fix.
* PermissionError no longer swallowed silently inside the
enumeration / counter loops -- now logged at debug.
Frontend (folder-browser.tsx, pickers.tsx, chat-api.ts):
* splitBreadcrumb stops mangling literal backslashes inside POSIX
filenames; only Windows-style absolute paths trigger separator
normalization. The Windows drive crumb value is now C:/ (drive
root) instead of C: (drive-relative CWD-on-C).
* browseFolders accepts and forwards an AbortSignal so cancelled
navigations actually cancel the in-flight backend enumeration.
* On initial-path fetch error, FolderBrowser now falls back to HOME
instead of leaving the modal as an empty dead end.
* When the auto-add path (one-click "Use this folder") fails, the
failure now surfaces via toast in addition to the inline
paragraph (which is hidden when the typed-input panel is closed).
* Studio: rebuild browse target from trusted root for CodeQL clean dataflow
CodeQL's py/path-injection rule kept flagging the post-validation
filesystem operations because the sandbox check lived inside a
helper function (_is_path_inside_allowlist) and CodeQL only does
intra-procedural taint tracking by default. The user-derived
``target`` was still flowing into ``target.exists`` /
``target.is_dir`` / ``target.iterdir``.
The fix: after resolving the user-supplied ``candidate_path``,
locate the matching trusted root from the allowlist and rebuild
``target`` by appending each individually-validated segment to
that trusted root. Each segment is rejected if it isn't a single
safe path component (no separators, no ``..``, no empty/dot).
The downstream filesystem ops now operate on a Path constructed
entirely from ``allowed_roots`` (trusted) plus those validated
segments, so CodeQL's dataflow no longer sees a tainted source.
Behavior is unchanged for all valid inputs -- only the
construction of ``target`` is restructured. Live + unit tests
all pass (58 selected, 7 deselected for Playwright env).
* Studio: walk browse paths from trusted roots for CodeQL
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Ubuntu <ubuntu@h100-8-cheapest.us-east5-a.c.unsloth.internal>
|
||
|
|
800ddc95f8
|
Re-apply #4939: updated models template mappers (#4950)
* Reapply "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945)
This reverts commit
|
||
|
|
c3cd890357
|
Studio: refresh Downloaded GGUF list and recurse into variant subdirs (#5032)
* Studio: refresh Downloaded GGUF list and recurse into variant subdirs
Two fixes for the model picker's "Downloaded" section.
Frontend (`pickers.tsx`):
* `HubModelPicker`'s mount effect short-circuited the cached-gguf and
cached-models refetch whenever the module-level cache already had
entries (`if (alreadyCached) return;`). After downloading a new repo
in the same session, reopening the picker rendered the stale cache
and the new repo never appeared in "Downloaded" until a full page
reload. The early return is removed so the lists are always refreshed
on mount; the module cache still drives the initial render so there
is no spinner flash when we already had data.
Backend (`utils/models/model_config.py`):
* `list_local_gguf_variants` and `_find_local_gguf_by_variant` used a
non-recursive `Path.glob("*.gguf")`. Some HF GGUF repos (e.g.
`unsloth/gemma-4-26B-A4B-it-GGUF`) place the largest quants under a
variant-named subdirectory such as `BF16/...gguf`, which the
top-level glob missed. Both helpers now use `rglob` and the variant
filename is stored as a path relative to the scan root so the
locator can still find the file.
The flat-layout case (variants directly in the snapshot root) is
unchanged: verified against `unsloth/gemma-4-E2B-it-GGUF` which still
returns its UD-Q4_K_XL variant correctly.
* Studio: emit posix-style relative filenames for local GGUF subdirs
`list_local_gguf_variants` was doing `str(f.relative_to(p))`, which on
Windows produces backslash-separated paths like `BF16\foo.gguf`. The
remote `list_gguf_variants` (HF API path) always returns forward-slash
filenames such as `BF16/foo.gguf`, so the two would diverge on Windows.
Switch to `.as_posix()` so the local and remote variant filenames stay
identical across Linux, macOS, and Windows. Verified by simulating with
`PureWindowsPath` in the test suite.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Studio: detect mmproj at snapshot root for nested-variant layouts
When _find_local_gguf_by_variant returns a weight file inside a
quant-named subdir (e.g. snapshot/BF16/foo.gguf), detect_mmproj_file
was scanning only the immediate parent and missing the mmproj file
sitting at the snapshot root. The model was then loaded without
--mmproj, silently breaking vision support for repos that ship
nested variants.
detect_mmproj_file now takes an optional search_root and walks up
from the weight file to that root, in order, so the mmproj at the
snapshot root is picked up. Sibling quant subdirs are not scanned,
so an unrelated variant's mmproj does not leak in.
Also apply the suggested micro-optimization on relative_to in
list_local_gguf_variants -- only build the posix path when storing
the first file for a quant.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
1ccfd2e0a5
|
fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033)
* fix(rocm): tighten gfx regex to ignore generic ISA lines
ROCm 6.1+ rocminfo emits generic ISA names such as
"amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic"
alongside the real GPU name. The previous `gfx[1-9]` regex used in
`_has_rocm_gpu` matched both, so a host with only a generic ISA entry
would be reported as having a usable AMD GPU.
Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids
match. This covers every documented target from GFX6 (gfx600) through
GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 /
MI250X) and gfx90c. Documented generic ISA names always have 1 or 2
digits before the dash and no longer match.
Applied to both `studio/install_python_stack.py` and
`studio/install_llama_prebuilt.py` so the two detection paths agree.
Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: Martin Hoyer <mhoyer@redhat.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
f9ef639dde
|
Studio: support GGUF variant selection for non-suffixed repos (#5023)
* fix: support GGUF variant selection for non-suffixed repos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: harden GGUF detection across cached models and picker flows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: use shared GGUF picker helper for search rows * fix: avoid mixed cache duplication and preserve GGUF fallback detection * fix: unify GGUF cache matching and merge picker hints * fix: normalize local GGUF matching across picker and model config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: robust cached-gguf classification + hint-aware click routing - _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback by commit_hash so partial/interrupted downloads don't TypeError out of sum() and wipe the entire cached list. - list_cached_gguf / list_cached_models: narrow per-repo try/except so one malformed repo no longer poisons the whole response. - handleModelClick: route through isKnownGgufRepo instead of the suffix-only isGgufRepo, so non-suffixed GGUF repos still open the variant expander from every call site. - Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known GGUF ids to stop conflating "no hint" with "known not-GGUF". - Make HfModelResult.isGguf required (it is always set in makeMapModel). - Add regression tests for the None size case, mixed-repo inclusion in cached-gguf, and per-repo error isolation. * fix: exclude mmproj from GGUF classification and case-normalize hint lookups - _repo_gguf_size_bytes now filters mmproj vision-adapter files so safetensors+mmproj.gguf repos stay on the cached-models path and non-GGUF rows no longer show zero pickable variants. A vision-capable GGUF repo (main weight + mmproj adapter) still classifies as GGUF and reports the main weight size. - modelGgufIds / resultGgufIds now key on lowercased ids and isKnownGgufRepo lowercases its lookup, so store and HF-search ids that differ only by casing still match the same GGUF hint. - New regression tests: mmproj-only repo excluded from cached-gguf, same repo included in cached-models, vision-capable repo still classified as GGUF with correct size. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
13928b5f0e
|
Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024)
* Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
5aa8c15246
|
Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021)
* Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast
llama-server's default behavior when the KV cache fills is to silently
drop the oldest non-``n_keep`` tokens and keep generating. The UI has
no way to tell the user that earlier turns were evicted -- they just
see degraded continuity and a confusing ``5,361 / 4,096`` on the
context usage bar.
Launch llama-server with ``--no-context-shift`` so it returns a clean
error once the request would exceed ``n_ctx``. In the chat adapter,
catch the error, identify it as a context-limit error via
``isContextLimitError()``, and surface a dedicated toast that names
the exact control to adjust: the ``Context Length`` field in the chat
Settings panel.
Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage
crosses 85%, so users see the "raise Context Length in Settings"
suggestion before they hit the hard stop.
Tests:
* ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift``
flag in the static launch-command template, and pins it inside the
unconditional ``cmd = [ ... ]`` block so a future refactor can't
hide it behind a branch.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Shorten --no-context-shift comment to 1 line
* Match backend _friendly_error rewrite in isContextLimitError
Codex review on PR caught that ``backend/routes/inference.py::_friendly_error``
rewrites the raw llama-server text
"request (X tokens) exceeds the available context size (Y tokens)"
into
"Message too long: X tokens exceeds the Y-token context window. ..."
on the main streaming GGUF path. The heuristic only looked for
"context size" / "exceeds the available context" / "context shift",
none of which survive the rewrite, so the new "Context limit reached"
toast would never fire for the most common case. Add matches for
"message too long" and "context window" so both wordings hit.
Also addresses Gemini feedback on the launch-flag test:
* Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of
reading ``__file__`` directly; scopes the assertions to the
function that actually launches llama-server.
* Replace the hardcoded ``" ]"`` indent search with a
line-at-a-time scan for a line that is just ``]``, so the test
survives reformatting.
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
5861a7ce15
|
Studio: split model-load progress label across two rows (#5020)
* Studio: split model-load progress label across two rows
The chat flow and training overlay both compose a progress label like
"112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the
percent badge in a single flex row. Once the rate + ETA part shows up,
the label outgrows the row width and wraps mid-phrase, orphaning the
percent ("19 left %") onto a second ragged line.
Fix in model-load-status.tsx: split the label on the first " • " into
a primary (size) chunk that stays on row 1 with the percent, and a
secondary (rate/ETA) chunk that renders on its own muted row below.
Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly
to one row. The inline-status variant keeps only the primary and
surfaces the full label via the tooltip.
Also extracts the rate/ETA math out of useTransferStats into a pure
``transfer-stats.ts`` module (appendSample + computeTransferStats) so
it can be reasoned about and tested without React. The hook is now a
thin wrapper that feeds sample history through the pure functions.
Backend: adds two companion test files for load_progress():
* test_llama_cpp_load_progress_matrix.py (21 tests) -- platform
matrix (Linux /proc, macOS/Windows absence), VmRSS parsing
variants (tab/space/missing/malformed), filesystem edges (HF-cache
symlinks, broken symlinks, nonexistent paths, relative paths),
shard aggregation (partial multi-shard, two series in same dir,
mmproj-* exclusion, single-file), lifecycle races, concurrent
sampling (10 threads x 50 iters against real /proc), fraction
bounds.
* test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live
integration: real subprocess allocating 100 MB to match VmRSS,
real ready phase, real dead-pid degradation, real shard
aggregation, repeated polling. Skipped on non-Linux.
Both complement the existing test_llama_cpp_load_progress.py.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Hoist splitProgressLabel out of JSX IIFE (review feedback)
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
bb14ab144a
|
Studio: live model-load progress + rate/ETA on download and load (#5017)
* Studio: live model-load progress + rate/ETA on download and load Two UX fixes for the opaque multi-minute wait between clicking Load and being able to chat, visible most clearly on large MoE GGUFs like MiniMax-M2.7 (131 GB of weights on a 97 GB GPU): 1. **Model-load phase is now observable.** The existing chat flow transitions the toast to "Starting model..." as soon as the download hits 100%, then shows a spinner with no other feedback until llama-server reports healthy. For a 130 GB model that spinner freezes for five-plus minutes while the kernel pages shards into the page cache. A new `GET /api/inference/load-progress` endpoint samples `/proc/<pid>/status VmRSS` on the llama-server subprocess against the sum of shard file sizes on disk, so the UI can render a real bar plus rate / ETA during that window. 2. **Rate and ETA on downloads and loads.** Both the chat toast and the training-start overlay used to show a static pair of numbers (for example "15.4 of 140.8 GB"). A rolling 15-second window over the existing byte-series now surfaces "85.3 MB/s, 24m 23s left" beside that pair. The estimator is shared between the download and load phases so the numbers don't reset when the phase flips. Also fixes a pre-existing assignment bug uncovered while wiring this up: `load_model` was storing the caller's `gguf_path` kwarg into `self._gguf_path`, which is `None` on the HF-download code path. The resolved on-disk path (`model_path`) is what llama-server actually mmaps; downstream consumers need that. No existing reader used `_gguf_path`, so this is a correctness fix for the new endpoint. - Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model. - Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay. - Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
514bb3a20e
|
studio: pin peft to 0.18.1 to fix export subprocess issues (#5015)
* studio: pin peft to 0.18.1 to fix export subprocess issues peft 0.19.0 causes export subprocess shutdown failures in Studio. Reverting to 0.18.1 resolves the issue. * studio: move peft pin to extras-no-deps to prevent torch upgrade Installing peft via overrides.txt would resolve its deps and pull in torch>=0.11.0, breaking other pinned packages. Moving the pin to extras-no-deps.txt ensures --no-deps is used during install. |
||
|
|
7252410ccc
|
studio: stream export worker output into the export dialog (#4897)
* studio: stream export worker output into the export dialog
The Export Model dialog only showed a spinner on the "Exporting..."
button while the worker subprocess was doing the actual heavy lifting.
For Merged to 16bit and GGUF / Llama.cpp exports this meant several
minutes (or more, for large models) of opaque silence, with no way to
tell whether save_pretrained_merged, convert_hf_to_gguf.py, or
llama-quantize was making progress.
This adds a live terminal-style output panel inside the export dialog,
rendered just above the Cancel / Start Export buttons and scrollable
with auto-follow-tail. It shows stdout and stderr from both the worker
process itself and any child process it spawns (GGUF converter,
llama-quantize), coloured by stream.
Backend
- core/export/worker.py: new _setup_log_capture(resp_queue) installed
before LogConfig.setup_logging. It saves the original stdout/stderr
fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so
every child process inherits the redirected fds), and spins up two
daemon reader threads. Each thread reads bytes from a pipe, echoes
them back to the original fd (so the server console keeps working),
splits on \n and \r, and forwards each line to the resp queue as
{"type":"log","stream":"stdout|stderr","line":...,"ts":...}.
PYTHONUNBUFFERED=1 is set so nested Python converters flush
immediately.
- core/export/orchestrator.py:
- Thread-safe ring buffer (collections.deque, maxlen 4000) with a
monotonically increasing seq counter. clear_logs(),
get_logs_since(cursor), get_current_log_seq(), is_export_active().
- _wait_response handles rtype == "log" by appending to the buffer
and continuing the wait loop. Status messages are also surfaced as
a "status" stream so users see high level progress alongside raw
subprocess output.
- load_checkpoint, _run_export, and cleanup_memory now wrap their
bodies with the existing self._lock (previously unused), clear the
log buffer at the start of each op, and flip _export_active in a
try/finally so the SSE endpoint can detect idle.
- routes/export.py:
- Wrapped every sync orchestrator call (load_checkpoint,
cleanup_memory, export_merged_model, export_base_model,
export_gguf, export_lora_adapter) in asyncio.to_thread so the
FastAPI event loop stays free during long exports. Without this
the new SSE endpoint could not be served concurrently with the
blocking export POST.
- New GET /api/export/logs/stream SSE endpoint. Honors
Last-Event-ID and a since query param for reconnect, emits log /
heartbeat / complete / error events, uses the id field to carry
the log seq so clients can resume cleanly. On first connect
without an explicit cursor it starts from the current seq so old
lines from a previous run are not replayed.
Frontend
- features/export/api/export-api.ts: streamExportLogs() helper that
authFetches the SSE endpoint and parses id / event / data fields
manually (same pattern as streamTrainingProgress in train-api.ts).
- features/export/components/export-dialog.tsx:
- Local useExportLogs(exporting) hook that opens the SSE stream on
exporting transitions to true, accumulates up to 4000 lines in
component state, and aborts on cleanup.
- New scrollable output panel rendered above DialogFooter, only
shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is
a fast disk write with nothing to show). Dark terminal styling
(bg-black/85, emerald text, rose for stderr, sky for status),
max-height 14rem, auto-scrolls to the bottom on new output but
stops following if the user scrolls up. A small streaming / idle
indicator is shown next to the panel title.
- DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the
output panel is visible so the logs have room to breathe.
Verified
- Python smoke test (tests/smoke_export_log_capture.py): spawns a
real mp.get_context("spawn") process, installs _setup_log_capture,
confirms that parent stdout prints, parent stderr prints, AND a
child subprocess invoked via subprocess.run (both its stdout and
stderr) are all captured in the resp queue. Passes.
- Orchestrator log helpers tested in isolation: _append_log,
get_logs_since (with and without a cursor), clear_logs not
resetting seq so reconnecting clients still progress. Passes.
- routes.export imports cleanly in the studio venv and /logs/stream
shows up in router.routes.
- bun run build: tsc -b plus vite build, no TypeScript errors.
No existing export behavior is changed. If the subprocess, the SSE
endpoint, or the frontend hook fails, the export itself still runs to
completion the same way it did before, with or without logs visible.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* export dialog: trim bootstrap noise, scope logs per screen, show realpath
Several follow-ups to the live export log work:
1. Worker bootstrap noise (transformers venv activation, Unsloth banner,
"Top GGUF/hub models" lists, vision detection, 2k-step weight load
bar) is dropped from the export-dialog stream. A threading.Event
gate in worker.py defaults closed and only opens once _handle_export
actually starts; until then the reader thread still echoes lines to
the saved console fd for debugging but does not push them onto the
resp_queue. The orchestrator already spawns a fresh subprocess for
every checkpoint load, so the gate is naturally reset between runs.
2. tqdm in non-tty mode defaults to a 10s mininterval, which makes
multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5
in the worker env so any tqdm-driven progress emits more often.
3. The dialog's useExportLogs hook now also clears its line buffer
when exportMethod or open changes, so re-opening the dialog into a
different action's screen no longer shows the previous action's
saved output. A useElapsedSeconds tick + "Working Xs" badge in the
log header gives users a visible sign that long single-step phases
(cache copies, GGUF conversion) are still running when no new lines
are arriving.
4. ExportBackend.export_{merged,base,gguf,lora} now return
(success, message, output_path); the worker forwards output_path on
each export_*_done response, the orchestrator's _run_export passes
it to routes/export.py, which surfaces it via
ExportOperationResponse.details.output_path. The dialog's Export
Complete screen renders the resolved on-disk realpath under "Saved
to" so users can find their exported model directly.
* fix(cli): unpack 3-tuple return from export backend
ExportOrchestrator.export_{merged,base,gguf,lora} now return
(success, message, output_path) so the studio dialog can show
the on-disk realpath. The CLI still unpacked 2 values, so every
`unsloth export --format ...` crashed with ValueError before
reporting completion. Update the four call sites and surface
output_path via a "Saved to:" echo.
* fix(studio): anchor export log SSE cursor at run start
The export dialog SSE defaulted its cursor to get_current_log_seq()
at connect time, so any line emitted between the POST that kicks
off the export and the client opening the stream was buffered with
seqs 1..k and then skipped (seq <= cursor). Long-running exports
looked silent during their first seconds.
Snapshot _log_seq into _run_start_seq inside clear_logs() and
expose it via get_run_start_seq(). The SSE default cursor now uses
that snapshot, so every line emitted since the current run began
is reachable regardless of when the client connects. Old runs
still can't leak in because their seqs are <= the snapshot.
* fix(studio): reconnect export log SSE on stream drop
useExportLogs launched streamExportLogs once per exporting
transition and recorded any drop in .catch(). Long GGUF exports
behind a proxy with an idle kill-timeout would silently lose the
stream for the rest of the run even though the backend already
supports Last-Event-ID resume. The "retry: 3000" directive emitted
by the backend is only meaningful to native EventSource; this
hook uses a manual fetch + ReadableStream parse so it had no
effect.
Wrap streamExportLogs in a retry loop that tracks lastSeq from
ExportLogEvent.id and passes it as since on reconnect. Backoff is
exponential with jitter, capped at 5s, reset on successful open.
The loop stops on explicit backend `complete` event or on effect
cleanup.
* fix(studio): register a second command so Typer keeps `export` as a subcommand
The CLI export unpacking tests wrap `unsloth_cli.commands.export.export`
in a fresh Typer app with a single registered command. Typer flattens a
single-command app into that command, so the test's
`runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading
`"export"` token as an unexpected extra positional argument -- every
parametrized case failed with:
Got unexpected extra argument (.../out)
Register a harmless `noop` second command so Typer preserves subcommand
routing and the tests actually exercise the 3-tuple unpack path they
were written to guard.
Before: 4 failed
After: 4 passed
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: studio-install <studio@local.install>
Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>
Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
|
||
|
|
eca592effe
|
studio: show HF model download progress in training start overlay (#4894)
* studio: show HF model download progress in training start overlay During the training setup phase, the overlay only displayed a static "Loading model..." line while model weights were being downloaded from Hugging Face. On slow connections this looked like the app had frozen. This adds a small self-contained progress block inside the existing TrainingStartOverlay that polls the existing GET /api/models/download-progress endpoint and renders a Progress bar with bytes downloaded, total bytes, and percent complete. Notes: - Frontend only change. No backend, worker, SSE, or runtime store edits. - Reuses the existing getDownloadProgress client wrapper and the existing /api/models/download-progress endpoint that already scans the HF blob cache for completed and .incomplete files. - selectedModel is read directly from useTrainingConfigStore inside the overlay, so no prop drilling and live-training-view.tsx is unchanged. - Polling runs at 1500 ms and is gated on the HF repo regex (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses, so local paths and empty form state never hit the endpoint. - Polling stops once progress reaches 1.0 so the bar can stay at 100 until the overlay hides on the first training step. - Network errors are silently swallowed, matching the chat side flow (the bar simply freezes at the last value). - When downloadedBytes is 0 the block is hidden entirely, so cached models do not flash a progress bar. - When the HF API cannot determine the total size, the block falls back to "X downloaded" with no percent and no bar. Verified with bun run build (tsc -b plus vite build, no TypeScript errors). * training overlay: track dataset download + show on-disk realpath Adds a dedicated "Downloading dataset..." section to the training-start overlay alongside the existing model-weights one, so an HF dataset that is downloading mid-startup is no longer mislabeled as model weights or hidden entirely. The new GET /api/datasets/download-progress endpoint mirrors /api/models/download-progress against the datasets-- prefix in HF_HUB_CACHE. Both endpoints now also return cache_path, the resolved on-disk realpath of the snapshot directory (or the cache repo root if no snapshot is materialized yet). The overlay surfaces this under each download row so users can immediately see where the model and dataset landed without digging through server logs. The frontend's existing useModelDownloadProgress hook is generalized to a single useHfDownloadProgress(repoId, fetcher) hook that the model and dataset variants both delegate to, keeping polling, gating, and completion semantics in one place. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: Polish training start overlay download progress UI (#4957) * studio: polish training start overlay download progress visuals * Fix formatCachePath cross-platform support and redundant sizeLabel - Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~ - Suppress sizeLabel when no byte info is available (cachePath-only state), since the "Preparing" badge already conveys the status * Fix misleading status badge when download total is unknown - Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot determine if the download is still in progress or already complete (happens when HF size metadata lookup fails for gated/private repos) - Keep "Preparing" badge for the zero-bytes cachePath-only state - Add Windows native path shortening to formatCachePath (C:\Users\<name>) --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: studio-install <studio@local.install> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> |
||
|
|
44082cf88e
|
Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014)
* Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM
The chat settings sheet's ctx slider reads `max_context_length` from
`/api/inference/status` and renders
Exceeds estimated VRAM capacity (N tokens). The model may use
system RAM.
when the user drags the slider above that value. For models whose
weights fit on some GPU subset, `_max_context_length` was already set
to the binary-search cap and the warning fired correctly.
For models whose weights exceed 90% of every GPU subset's free memory
(e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe
loop never matched a subset, so `max_available_ctx` stayed at the
native context (e.g. 196608). The slider ran all the way to native
with no indication that any value above the 4096 spec default would
trigger `--fit on` and degrade performance.
Anchor `max_available_ctx` at `min(4096, native_context_length)` when
no subset fits, so the warning fires at the right threshold and the
user sees the correct safe-zone / warning-zone split:
Before (MiniMax-M2.7 on 97 GB GPU):
slider 0 .. 196608, warning threshold = 196608 (never fires)
After:
slider 0 .. 196608, warning threshold = 4096 (fires correctly)
No frontend changes required: `chat-settings-sheet.tsx` already
consumes `ggufMaxContextLength` (= status.max_context_length) as the
warning threshold and `ggufNativeContextLength` as the slider max.
Adds tests/test_llama_cpp_max_context_threshold.py covering
weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096
fallback case (don't lie about supported ctx), fittable-model
regressions (small / multi-GPU / tiny on huge GPU), and the
`max_context_length` property's fallback semantics.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
b2f80f210e
|
Studio: make GGUF disk-space preflight cache-aware (#5012)
* Studio: make GGUF disk-space preflight cache-aware The pre-download disk check in LlamaCppBackend.load_model compared the repo's total GGUF size against free disk without crediting bytes already present in the Hugging Face cache. Re-loading a large cached model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with "Not enough disk space to download any variant" whenever free disk was below the full weight footprint, even though nothing actually needed to be downloaded. Subtract bytes already on disk via try_to_load_from_cache before comparing against free space. A partial blob (interrupted download) is not credited, so a second attempt still allocates room to finish the download. The log line now also surfaces how much is already cached. Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk, cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse tempfiles keep the GB-scale scenarios cheap to simulate. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
767fa8cade
|
Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011)
* Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM The load-time auto-fit in LlamaCppBackend.load_model had two issues for models whose weights do not fit on any GPU subset (the common case for large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.): 1. Auto mode (max_seq_length=0) left effective_ctx at the model's native context when no subset passed the 90% fit check. The UI slider then landed on e.g. 196608 for MiniMax-M2.7, far above anything usable. Default the auto-pick to 4096 so the UI starts at a sane value; the slider ceiling stays at the native context so the user can still opt in to longer contexts and receive the "might be slower" warning. 2. Explicit ctx was silently shrunk when weights fit but the requested KV overflowed the 90% budget. The shrink loop emitted -c <capped> -ngl -1 without informing the caller, so a user who had opted into a longer context via the UI never actually got it. Drop the shrink loop on the explicit path and emit -c <user_ctx> --fit on instead, letting llama-server flex -ngl (CPU layer offload). Adds tests/test_llama_cpp_context_fit.py covering both paths, the file-size-only fallback when KV metadata is missing, non-regression on fittable auto-pick, and platform-agnostic input shape. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
a31c82a640
|
fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922)
* fix: increase wait response timeout to 900 sec instead of 300 sec. #4845 * Apply suggestion from @gemini-code-assist[bot] good catch Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> |
||
|
|
da78c6be71
|
[Studio] Install flash attn at setup time for linux (#4979)
* [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> |
||
|
|
dccc0ebada
|
[Studio] Show non exported models in chat UI (#4892)
* Show non exported models in chat UI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Distinguish b/w LoRa and full fine tune saves. Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> |
||
|
|
a50f61009b
|
fix(studio): default chart view to full training history (#5007)
* fix(studio): default chart view to full training history instead of last 80 steps Fixes #5003 * chore: windowsize as null code comment --------- Co-authored-by: imagineer99 <samleejackson0@gmail.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com> |
||
|
|
bfa17330bd
|
Studio: Polish API key copy button and harden async clipboard fallback (#5006)
* fix: polish clipboard style and fix async clipboard path
* Use copyToClipboardAsync in CopyButton for Safari fallback
CopyButton was calling navigator.clipboard.writeText directly,
bypassing the execCommand fallback added in this same PR. Switch
to copyToClipboardAsync which tries execCommand first (Safari
user-gesture requirement) then falls back to the async clipboard API.
* Fix copyToClipboard sync contract regression and improve async path
- Restore copyToClipboard() to return only the execCommand result,
preserving the boolean contract that 7 existing callers depend on
to gate their "Copied!" UI state. The fire-and-forget async fallback
was returning true before the promise resolved, causing false success.
- Add document.body null guard to copyWithExecCommand for SSR safety.
- Reorder copyToClipboardAsync to try the async Clipboard API first,
avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs
where execCommand always fails anyway.
* Restore queryCommandSupported guard and fix async catch path
- Restore the queryCommandSupported("copy") guard in copyToClipboard()
to match the original contract exactly: when execCommand is entirely
unsupported, fall through to fire-and-forget async clipboard write.
- Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText
rejects, the user-gesture frame is gone, so execCommand will also fail.
Return false from catch instead of falling through. The execCommand
fallback at the bottom only runs when the Clipboard API is absent
(still in user-gesture frame).
* Restore execCommand fallback in copyToClipboardAsync catch path
The catch block was returning false after clipboard API rejection,
based on the incorrect premise that the user-gesture frame is lost
after an await. Per the HTML spec, transient user activation IS
preserved through promise microtask chains. The real reason
execCommand fails in the Radix dialog is the focus trap intercepting
textarea.focus(), not gesture loss.
For non-dialog callers, execCommand can still succeed after a
clipboard rejection. Inside a Radix modal, execCommand returns
false harmlessly (focus trap blocks it).
* Harden textarea fallback for mobile and continue to async path on failure
---------
Co-authored-by: Daniel Han <danielhanchen@gmail.com>
Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>
|
||
|
|
97eafd999e
|
studio: fix api-keys access + refresh (#5005)
* studio: fix api-keys access + refresh * studio: guard v1 in spa fallback |
||
|
|
d2fc582840
|
studio: skip training status/metrics polling when idle (#4988)
* fix(studio): skip training status/metrics polling when idle Add an early return in the status and metrics setInterval callbacks when the runtime store reports phase === "idle" and hasHydrated is true. Previously these polls fired unconditionally every 3s/5s, generating unnecessary network traffic and console errors when no training was running. * fix(studio): reduce idle polling to 30s instead of stopping entirely Review feedback (PR #4988): completely stopping polling when idle risks permanent UI desync if hydration fails, and misses out-of-band state changes from other clients. Add a 30s background poll that only fires when idle to recover gracefully. * fix: harden idle status polling around hydration and runtime reset --------- Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: imagineer99 <samleejackson0@gmail.com> |
||
|
|
9a261aec5f
|
Studio: Expose openai and anthropic compatible external API end points (#4956)
* Studio: add API key authentication for programmatic access External users want to hit the Studio API (chat completions with tool calling, training, export, etc.) without going through the browser login flow. This adds sk-unsloth- prefixed API keys that work as a drop-in replacement for JWTs in the Authorization: Bearer header. Backend: - New api_keys table in SQLite (storage.py) - create/list/revoke/validate functions with SHA-256 hashed storage - API key detection in _get_current_subject before the JWT path - POST/GET/DELETE /api/auth/api-keys endpoints on the auth router Frontend: - /api-keys page with create form, one-time key reveal, keys table - API Keys link in desktop and mobile navbar - Route registered with requireAuth guard Zero changes to any existing route handler -- every endpoint that uses Depends(get_current_subject) automatically works with API keys. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use actual origin in API key usage examples The examples on /api-keys were hardcoded to localhost:8888 which is wrong for remote users. Use window.location.origin so the examples show the correct URL regardless of where the user is connecting from. * Add `unsloth studio run` CLI command for one-liner model serving Adds a `run` subcommand that starts Studio, loads a model, creates an API key, and prints a ready-to-use curl command -- similar to `ollama run` or `vllm serve`. Usage: unsloth studio run -m unsloth/Qwen3-1.7B-GGUF --gguf-variant UD-Q4_K_XL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add end-to-end tests for `unsloth studio run` and API key usage Tests the 4 usage examples from the API Keys page: 1. curl basic (non-streaming) chat completions 2. curl streaming (SSE) chat completions 3. OpenAI Python SDK streaming completions 4. curl with tools (web_search + python) Also tests --help output, invalid key rejection, and no-key rejection. All 7 tests pass against Qwen3-1.7B-GGUF. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add /v1/completions, /v1/embeddings, /v1/responses endpoints and --parallel support - llama_cpp.py: accept n_parallel param, pass to llama-server --parallel - run.py: plumb llama_parallel_slots through to app.state - inference.py: add /completions and /embeddings as transparent proxies to llama-server, add /responses as application-level endpoint that converts to ChatCompletionRequest; thread n_parallel through load_model - studio.py: set llama_parallel_slots=4 for `unsloth studio run` path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make /v1/responses endpoint match OpenAI Responses API format The existing /v1/responses shim returned Chat Completions format, which broke OpenAI SDK clients using openai.responses.create(). This commit replaces the endpoint with a proper implementation that: - Returns `output` array with `output_text` content parts instead of `choices` with `message` - Uses `input_tokens`/`output_tokens` instead of `prompt_tokens`/ `completion_tokens` in usage - Sets `object: "response"` and `id: "resp_..."` - Emits named SSE events for streaming (response.created, response.output_text.delta, response.completed, etc.) - Accepts all OpenAI Responses API fields (tools, store, metadata, previous_response_id) without erroring -- silently ignored - Maps `developer` role to `system` and `input_text`/`input_image` content parts to the internal Chat format Adds Pydantic schemas for request/response models and 23 unit tests covering schema validation, input normalisation, and response format. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: add Anthropic-compatible /v1/messages endpoint (#4981) * Add Anthropic-compatible /v1/messages endpoint with tool support Translate Anthropic Messages API format to/from internal OpenAI format and reuse the existing server-side agentic tool loop. Supports streaming SSE (message_start, content_block_delta, etc.) and non-streaming JSON. Includes offline unit tests and e2e tests in test_studio_run.py. * Add enable_tools, enabled_tools, session_id to /v1/messages endpoint Support the same shorthand as /v1/chat/completions: enable_tools=true with an optional enabled_tools list uses built-in server tools without requiring full Anthropic tool definitions. session_id is passed through for sandbox isolation. max_tokens is now optional. * Strip leaked tool-call XML from Anthropic endpoint content Apply _TOOL_XML_RE to content events in both streaming and non-streaming tool paths, matching the OpenAI endpoint behavior. * Emit custom tool_result SSE event in Anthropic stream Adds a non-standard tool_result event between the tool_use block close and the next text block, so clients can see server-side tool execution results. Anthropic SDKs ignore unknown event types. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Split /v1/messages into server-side and client-side tool paths enable_tools=true runs the existing server-side agentic loop with built-in tools (web_search/python/terminal). A bare tools=[...] field now triggers a client-side pass-through: client-provided tools are forwarded to llama-server and any tool_use output is returned to the caller with stop_reason=tool_use for client execution. This fixes Claude Code (and any Anthropic SDK client) which sends tools=[...] expecting client-side execution but was previously routed through execute_tool() and failing with 'Unknown tool'. Adds AnthropicPassthroughEmitter to convert llama-server OpenAI SSE chunks into Anthropic SSE events, plus unit tests covering text blocks, tool_use blocks, mixed, stop reasons, and usage. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix httpcore GeneratorExit in /v1/messages passthrough stream Explicitly aclose aiter_lines() before the surrounding async with blocks unwind, mirroring the prior fix in external_provider.py ( |
||
|
|
3bb72a557f
|
Pin kernels==0.12.1 to avoid huggingface_hub dataclass conflict (#5000) | ||
|
|
21a7895959
|
Studio: Prompt manager, message deletion, and chat UI improvements (#4938)
* feat(chat): code block styling, delete with Dexie sync, settings sheet polish * style: config save/delete padding fix * fix(studio): centralize dark code-block surface and optimize message sync writes * style: config padding/alignment polish * fix(studio): upsert custom presets without implicit rename-delete * fix settings sheet save state polish * fix settings sheet button widths * fix chat settings presets * fix chat delete sync * fix chat trust remote code flow --------- Co-authored-by: shine1i <wasimysdev@gmail.com> |
||
|
|
3b092bcd46
|
fix(studio): prevent route transition DOM duplication via AnimatePresence (#4987)
Add mode="wait" and exit={{ opacity: 0 }} to the root AnimatePresence
wrapper so outgoing routes fully unmount before incoming routes render.
Without this, rapid navigation between Studio/Export/Recipes/Chat caused
pages to stack (2x–3x duplication).
Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com>
Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>
|
||
|
|
65b4028560
|
Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) (#4954)
* Pin bitsandbytes to continuous-release_main on ROCm for 4-bit decode fix
bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on
every ROCm target:
- CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a
broken blocksize=32/64 warp64 GEMV kernel whose tests were
explicitly skipped with ROCM_WARP_SIZE_64 guards because the
code was known broken.
- RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time
BNB_WARP_SIZE macro in the host-side dispatch that resolves to
64 when the multi-arch wheel is compiled with CDNA as the
primary target, so num_blocks is wrong on RDNA and half the GEMV
output is never written.
At decode shape (1, 1, hidden) both bugs produce NaN. Training is
unaffected because training shapes are (batch, seq_len > 1, hidden)
and never touch the GEMV path. The crash during autoregressive
inference surfaces as _assert_async_cuda_kernel in torch.multinomial
which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of
a clean Python error.
Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable
blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA",
PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a
runtime hipDeviceGetAttribute query and ships a working CDNA warp64
kernel. That commit has not shipped to PyPI yet, but
continuous-release_main wheels are published on every push to bnb
main via GitHub Releases.
Point the ROCm install path at the continuous-release_main x86_64 and
aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is
unreachable (offline installs, firewalled hosts, or architectures not
covered by the pre-release wheels). Drop the pin once bnb cuts a
0.50+ tag on PyPI.
Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct
bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1
(no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit
sampling generation works end-to-end.
NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is
gated on the ROCm torch index and platform.machine() respectively.
* Drop Studio ROCm 16-bit fallback now that bnb 0.50+ fixes 4-bit decode
The 16-bit fallback in studio/backend/core/inference/inference.py was
added as a workaround for a bug that this PR already fixes at the
install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel
on every ROCm target, which NaNs at decode shape (seq_len=1) and
crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in
0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this
PR) restores correct 4-bit decode on MI300X and verified working
end-to-end with full Unsloth + for_inference + sampling.
Revert the dual code path so ROCm and NVIDIA both go through the
normal FastLanguageModel.from_pretrained + for_inference flow:
- Remove the conditional `from unsloth import` that skipped the
import on ROCm. The monkey-patches it was trying to avoid were
never the cause of the crash; bnb 4-bit GEMV was.
- Remove the `if _hw_module.IS_ROCM:` branch in load_model that
loaded with plain transformers + PEFT + bfloat16, and the
`_resolve_fp16_base` helper it relied on.
- Remove the `get_chat_template is not None` fallback in
_load_chat_template_info -- get_chat_template is now always
imported.
- Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM
directly instead of the removed _IS_ROCM_ENV global. Audio and
vision on ROCm still need separate validation (FastVisionModel
and the CSM audio codecs were never tested on HIP) so the guard
stays for now.
Add _bnb_rocm_4bit_ok() as a runtime safety net for users who
install from this PR before the install.sh bnb pin kicks in, or
whose installer fell back to the PyPI pin because the continuous-
release wheel was unreachable. When the installed bnb is < 0.50 on
ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit /
-bnb-4bit suffix from the model path so a pre-quantized repo
resolves to its FP16 sibling instead of pulling bnb back in via
the repo's quantization_config. LoRA adapters whose base is a
pre-quantized repo on old bnb will still fail inside Unsloth's
loader -- the only real fix there is `unsloth studio update`.
Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1):
- HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized
repo): loads in 4-bit via the fixed GEMV, generation returns
"Paris." for greedy and sampling.
- SAFETY-NET path (simulated old bnb, suffix-stripped to the
FP16 sibling, load_in_4bit=False): loads in bf16, generation
returns "Paris." for greedy and sampling.
Net diff is ~45 lines smaller than the pre-revert state because
the entire plain-transformers 16-bit branch is gone.
* Cache _bnb_rocm_4bit_ok() with functools.cache
load_model() can be called many times in a single session but the bnb
version and hardware state cannot change at runtime, so memoise the
check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes`
inside the try block), subsequent calls drop to sub-microsecond dict
lookups. Zero behavioral change.
* Shorten verbose bnb/ROCm comments
Comment-only cleanup across install.sh, studio/install_python_stack.py,
and studio/backend/core/inference/inference.py. No behavioral change.
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* Remove _bnb_rocm_4bit_ok safety net from inference.py
Studio's ROCm support is brand new (PR #4720, merged today) and every
fresh install pulls the bnb continuous-release_main wheel via
install.sh / install_python_stack.py in this same PR. There are no
existing ROCm Studio installs carrying bnb < 0.50, so the defensive
version-check fallback is guarding against a scenario that cannot
actually occur. Delete the helper, the functools import, and the
safety-net block -- inference.py now calls FastLanguageModel.from_pretrained
directly with no ROCm branching.
* Drop audio/vision ROCm guard in inference.py — verified unblocked by bnb fix
Vision inference was blocked by the same bnb 4-bit GEMV bug that affected
text inference (vision models use bnb 4-bit for the LM backbone). With
bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works
end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit
loaded in 4-bit via FastVisionModel + for_inference returns a correct
answer to a multimodal prompt.
Audio (CSM) was never actually blocked by HIP — on this hardware CSM
loads and runs its backbone forward pass fine with bnb 0.50, then fails
during generate() with a transformers-level kwarg validation mismatch
in generation_csm.py (`backbone_last_hidden_state` rejected). That's a
pre-existing transformers/CSM integration bug that reproduces identically
on NVIDIA, so the ROCm-gated guard was never actually protecting users
from anything HIP-specific.
Remove the combined audio/vision guard and the now-unused _hw_module
import. Also restore the one-word "Can be" in an inline comment that
drifted during the earlier comment-shortening pass, so the inference.py
delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and
nothing else.
* Shorten max_seq_length=0 guard comment to one line
---------
Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
|
||
|
|
cad8c6ad05
|
Add AMD ROCm/HIP support across installer and hardware detection (#4720)
* Add ROCm detection to install.sh and expand shell tests Add AMD ROCm GPU detection to get_torch_index_url() in install.sh. When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm version file, hipconfig, dpkg-query, and rpm. Includes validation guard for malformed _rocm_tag, Debian epoch prefix stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install, and status messaging. Shell tests expanded to 23 cases. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm torch reinstall support to install_python_stack.py Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a Linux host has ROCm but the venv received CPU-only torch, and reinstall with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a 30-second timeout on the torch GPU probe subprocess. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm support to llama.cpp prebuilt installer Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP prebuilt with CPU fallback). Add linux-rocm and windows-hip install kinds to runtime_patterns_for_choice(). Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add IS_ROCM hardware flag and fix AMD error message Add IS_ROCM flag to hardware.py detect_hardware() (set when torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM from __init__.py. Add "rocm" key to get_package_versions(). Replace "We do not support AMD" error in tokenizer_utils.py with a helpful message pointing to ROCm installation docs. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add comprehensive ROCm support test suite (68 tests) Add tests/studio/install/test_rocm_support.py covering all ROCm code paths across install_llama_prebuilt.py, install_python_stack.py, hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks and run without AMD hardware. Covers: asset selection (11), runtime patterns (5), HostInfo (4), ROCm version detection (9), torch reinstall (9), index mapping (8), hardware flag (8), tokenizer message (2), install.sh structure (10), and live regression (1). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm support: probe error handling, version cap, validation Address review findings from 8 independent reviewers: - Wrap _ensure_rocm_torch() torch probe in try/except for TimeoutExpired and OSError so a hung or broken torch import does not crash the installer (8/8 reviewers flagged this) - Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to prevent installing unsupported torch 2.11.0 from the rocm7.1 index - Use with-statement for file reads in _detect_rocm_version() to avoid resource leaks - Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of default parameter to avoid relative path resolution) - Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to reject rocm0.x tags that would produce nonexistent PyTorch index URLs - Switch shell version cap from blocklist to allowlist (rocm6.*|rocm7.0* |rocm7.1* pass through, everything else caps to rocm7.1) so future ROCm 10+ does not fall through to a nonexistent index - Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering - Fix test_probe_timeout_handled: replace zero-assertion test with proper assertions verifying reinstall proceeds after timeout * Clean up rocm_paths list construction in detect_host() Filter None from the ROCM_PATH env var lookup at list construction time instead of relying on the inline `if p` guard in the any() call. * Require actual AMD GPU presence before selecting ROCm paths All 8 reviewers across 2 cycles independently flagged that ROCm detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core) as a proxy for GPU presence, which would misroute CPU-only or NVIDIA hosts that happen to have ROCm tools installed. Now all 3 detection points (install.sh, install_python_stack.py, install_llama_prebuilt.py) probe for an actual AMD GPU before entering the ROCm path: - install.sh: check rocminfo for gfx* GPU names, or amd-smi list for device rows, before version detection - install_python_stack.py: new _has_rocm_gpu() function probes rocminfo and amd-smi list before _ensure_rocm_torch() proceeds - install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi list instead of just checking tool existence or directory paths Also: - Shell test mock amd-smi now handles "list" subcommand - Python tests updated to mock _has_rocm_gpu where needed - Added test_no_gpu_with_rocm_tools_skips to verify the new guard - Test index lookups now use sorted() to match production code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden hipconfig version parsing and torch probe compatibility - Add parts[1].isdigit() check in hipconfig version parsing to handle versions like "6.3-HIP" where the minor component has non-numeric suffix (strip "-" prefix before int() conversion) - Use getattr() in torch probe subprocess to safely handle old or custom torch builds that may lack torch.version.hip/cuda attributes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Strengthen AMD GPU detection and add NVIDIA precedence guard - Change amd-smi list detection from any-non-empty-output to requiring "gpu" marker in output, matching the shell-side NR>1 check. Prevents false positives from header-only amd-smi list output. - Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and install_llama_prebuilt.py behavior). - Apply the same amd-smi marker fix to install_llama_prebuilt.py detect_host() for consistency. * Add Windows-specific ROCm/HIP detection in detect_host() The previous detect_host() ROCm check used rocminfo and amd-smi list which are Linux-only tools. On Windows, has_rocm would always be False, making the Windows HIP prebuilt path at line 1794 unreachable. Now detect_host() uses platform-specific detection: - Linux: rocminfo (check for gfx GPU names) or amd-smi list - Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH This allows Windows AMD users to get the HIP prebuilt binary instead of silently falling through to the CPU prebuilt. * Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion - worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check for hipcc before ROCm source builds, improve status messages and error reporting, add timeout and uv support for the source build fallback - amd.py: New AMD GPU monitoring module via amd-smi metric --json, mirroring nvidia.py structure (utilization, temperature, power, VRAM) - hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization, visible GPU queries, and physical GPU count - install_python_stack.py: Detect AMD GPUs on Windows and warn that ROCm-enabled PyTorch must be installed manually - kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032), RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries - tests: Add 32 new tests covering all changes (95/95 pass) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage - Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi output markers instead of just checking tool existence on PATH - _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools) - amd.py _parse_numeric: handle dict-shaped metric objects from newer amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units - amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs - amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly - install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels exist for older versions); fix rocm7.1* glob to not match rocm7.10+ - is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.) - worker.py: increase ROCm source build timeout from 600s to 1800s; fix success log message for ROCm source builds - Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation - hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with HIP-specific env vars report the correct visible device set - amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly; fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB - install_python_stack.py: Windows AMD warning now validates actual GPU presence via hipinfo/amd-smi output markers before printing - install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP detection after tool-based checks, so Windows HIP installs without CLI tools on PATH are still detected - hardware.py: fix IS_ROCM comment to accurately describe its role * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec Use explicit None checks instead of Python `or` operator when reading HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string ("") is correctly honored as "no visible GPUs" rather than silently falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems. * Fix IS_ROCM test assertion for multi-line formatting * Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count - Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in both install.sh and install_python_stack.py to prevent resolver from selecting incompatible companion packages from ROCm wheel index - Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence without hipinfo/amd-smi is not proof of GPU existence) - Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts * Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428 The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3 (gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201) architectures is based on the original work from PR #4428. Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> * Support AMD Radeon for studio (#4770) Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> * Remove ROCm test files from main PR Move test_rocm_support.py and shell test additions to a separate PR to keep the main ROCm support PR focused on implementation changes. * Fix installer and hardware detection issues for PR #4720 - Fix empty _tri_arg passed to uv pip install in Radeon path (causes "Empty field is not allowed for PEP508" error) - Fix Radeon fallback: use ROCm index instead of CPU-only when repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm) - Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings - Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64 wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag) - Fix IS_ROCM export: use __getattr__ so callers always see the live value after detect_hardware() runs - Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set - Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB) - Add amd-smi version as a fallback in _detect_rocm_version - Fix trailing whitespace and missing newline at EOF in install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GPU detection false positives and add missing health groups - Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows from amd-smi list, not just header containing "gpu" - Apply same fix in detect_host() in install_llama_prebuilt.py - Add runtime_payload_health_groups for linux-rocm and windows-hip so partial/corrupt ROCm/HIP prebuilt installs are properly detected - Add bitsandbytes install to Radeon fallback paths (was only in the success path, skipped when repo.radeon.com was unreachable) - Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main) and only use __getattr__ for IS_ROCM * Fix _ensure_rocm_torch and Windows AMD warning false positives - _ensure_rocm_torch: only skip when HIP is already present, not for CUDA builds (which are unusable on AMD-only hosts). Fixes the case where a venv has a stale CUDA wheel and the repair step is skipped. - Windows AMD warning: use GPU data row check (same as Linux fix) to avoid false positives from amd-smi list header-only output. * Fix amd-smi GPU detection for GPU[N] output format Older amd-smi versions output "GPU[0] : Card series: ..." instead of "GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>" formats to detect actual GPU data rows. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden AMD GPU detection against false positives - install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with strict pattern matching GPU data rows (/^GPU[[:space:]]*[:\[]/) - All files: reject rocminfo gfx000 (CPU HSA agent) by requiring gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe - Fixes false positives on hosts with ROCm tools but no AMD GPU * Remove duplicate comment from pre-commit merge * Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports - Extract _has_amd_rocm_gpu() shell function to avoid duplicating the rocminfo/amd-smi GPU detection logic in get_torch_index_url and the Radeon auto-detect block - Consolidate bitsandbytes install into a single case block after torch install (was duplicated 4 times across Radeon success/fallback paths) - Move math and re imports to top of amd.py (were inline in functions) - Add _smi_query() helper in hardware.py to centralize IS_ROCM backend selection for get_gpu_utilization and get_visible_gpu_utilization Addresses Gemini code review suggestions. * Fix VRAM parsing for string values and GB/GiB consistency - Extract unit from string-valued VRAM fields (e.g. "192 GiB") so _parse_memory_mb correctly applies the unit multiplier instead of treating the value as bare MB - Treat GB and GiB identically (both as binary x1024) since GPU tools including amd-smi use binary units even when labeling them "GB" - Fixes incorrect VRAM reporting on MI300-class cards (was showing ~0.19 GB instead of 192 GB for string-valued outputs) * Add --no-cache to uv for ROCm HIP source builds Avoid stale cache artifacts from partial HIP source builds when uv is used for causal-conv1d/mamba-ssm compilation on ROCm. The pip path already uses --no-cache-dir; this adds the uv equivalent (--no-cache) only when is_hip is True. * Fix critical: initialize _amd_gpu_radeon before case block _amd_gpu_radeon was only set inside the */rocm*) case arm, so on NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm", the variable was unbound. With set -u (nounset) enabled, this crashes the installer for every non-AMD user. Move initialization to before the case block so it is always defined. * Fix Windows AMD: route has_rocm hosts to HIP prebuilt path resolve_release_asset_choice was selecting windows-cpu for all Windows x86_64 hosts including those with has_rocm=True. Windows AMD users should fall through to resolve_upstream_asset_choice which tries the HIP prebuilt first. Add "not host.has_rocm" guard to the published windows-cpu selection. * Harden ROCm detection, Radeon wheel fallback, and HIP visibility Addresses review findings from parallel reviewers on PR #4720: - install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L to actually list a GPU before treating the host as NVIDIA. Fixes the stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the CUDA branch. - install.sh: fix hipconfig awk blocks to propagate a non-zero exit code when the output is not a recognisable version string, so the ||-chain continues to dpkg-query / rpm instead of terminating early. - install.sh: fail-closed on Radeon wheel fallback. When torch, torchvision or torchaudio is missing from the Radeon repo for the active Python tag, fall back to the standard ROCm index instead of silently mixing Radeon wheels with PyPI defaults. Quote all wheel arguments individually so wheel filenames cannot be word-split or glob-expanded. - install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts with a broken leftover nvidia-smi to the ROCm path instead of misclassifying them as NVIDIA. - install_llama_prebuilt.py: scan upstream assets for any rocm-<version> prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+ users pick up a matching upstream prebuilt when one exists. - install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted on the GPU path instead of passing validation on CPU only. - install_llama_prebuilt.py: restore the published windows-cpu fallback for AMD Windows hosts without a HIP prebuilt so hash-approved bundles are still preferred over the raw upstream CPU asset. - install_python_stack.py: drop the /opt/rocm / hipcc gate in _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm installs (package-managed minimal installs, Radeon software) that ship amd-smi / rocminfo without hipcc can now repair a CPU-only venv via "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard. - studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in get_primary_gpu_utilization(). A process restricted to GPU 2 now reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain bytes unit detection to an explicit allowlist. - studio/backend/utils/hardware/hardware.py: route get_backend_visible_gpu_info()'s backend_cuda_visible_devices field through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the unconditional "(rocm=False)" suffix in apply_gpu_ids() logs. * Fix round 2 regressions: ROCm validate_server and Windows HIP routing Follow-up to |
||
|
|
33503ea248
|
Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945)
This reverts commit
|
||
|
|
bcf4fd6bd3
|
updated models template mappers. added lfm2.5vl450m to transformers 5… (#4939)
* updated models template mappers. added lfm2.5vl450m to transformers 5.3.0 whitelist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> |
||
|
|
dc16e0c65b
|
Studio: keep chat input visible and fix compare pane clipping (#4924)
* fix(chat): sticky composer bar in thread * fix(chat): fix compare pane clipping * fix(chat): tighten scroll-to-bottom placement and compare footer spacing * Fix TypeScript build break and clean up ViewportFooter classes - Remove unused `compact` prop from ThreadScrollToBottom call site (component is FC with no props, passing it caused TS2322) - Extract shared classes (sticky, bottom-0, z-20, bg-transparent) from ternary branches into the unconditional className string - Restore `relative` on normal-mode footer so the inner absolute bg-background strip has a positioning context - Remove redundant md:pb-3 / md:pb-4 (same value as base pb-3 / pb-4) - Remove no-op `sticky bottom-0` from SharedComposer wrapper in both LoraCompareContent and GeneralCompareContent (flex layout with shrink-0 already pins it at the bottom; parent has no scrollable overflow for sticky to bind to) - Fix truncated comment on pointer-events rationale --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|
|
5fa8683b27
|
build(deps): bump the bun-frontend group across 1 directory with 16 updates (#4586)
* build(deps): bump the bun-frontend group across 1 directory with 16 updates Bumps the bun-frontend group with 16 updates in the /studio/frontend directory: | Package | From | To | | --- | --- | --- | | [@dagrejs/dagre](https://github.com/dagrejs/dagre) | `2.0.4` | `3.0.0` | | [@dagrejs/graphlib](https://github.com/dagrejs/graphlib) | `3.0.4` | `4.0.1` | | @hugeicons/core-free-icons | `3.3.0` | `4.0.0` | | [@streamdown/cjk](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-cjk) | `1.0.2` | `1.0.3` | | [@streamdown/code](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-code) | `1.0.2` | `1.1.1` | | [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) | `0.577.0` | `1.6.0` | | [recharts](https://github.com/recharts/recharts) | `3.7.0` | `3.8.0` | | [shadcn](https://github.com/shadcn-ui/ui/tree/HEAD/packages/shadcn) | `3.8.5` | `4.1.0` | | [streamdown](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown) | `2.3.0` | `2.5.0` | | [@biomejs/biome](https://github.com/biomejs/biome/tree/HEAD/packages/@biomejs/biome) | `1.9.4` | `2.4.8` | | [@eslint/js](https://github.com/eslint/eslint/tree/HEAD/packages/js) | `9.39.4` | `10.0.1` | | [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) | `24.12.0` | `25.5.0` | | [eslint](https://github.com/eslint/eslint) | `9.39.4` | `10.1.0` | | [eslint-plugin-react-refresh](https://github.com/ArnaudBarre/eslint-plugin-react-refresh) | `0.4.26` | `0.5.2` | | [globals](https://github.com/sindresorhus/globals) | `16.5.0` | `17.4.0` | | [typescript](https://github.com/microsoft/TypeScript) | `5.9.3` | `6.0.2` | Updates `@dagrejs/dagre` from 2.0.4 to 3.0.0 - [Release notes](https://github.com/dagrejs/dagre/releases) - [Changelog](https://github.com/dagrejs/dagre/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/dagre/compare/v2.0.4...v3.0.0) Updates `@dagrejs/graphlib` from 3.0.4 to 4.0.1 - [Release notes](https://github.com/dagrejs/graphlib/releases) - [Changelog](https://github.com/dagrejs/graphlib/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/graphlib/compare/v3.0.4...v4.0.1) Updates `@hugeicons/core-free-icons` from 3.3.0 to 4.0.0 Updates `@streamdown/cjk` from 1.0.2 to 1.0.3 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-cjk/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/cjk@1.0.3/packages/streamdown-cjk) Updates `@streamdown/code` from 1.0.2 to 1.1.1 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-code/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/code@1.1.1/packages/streamdown-code) Updates `lucide-react` from 0.577.0 to 1.6.0 - [Release notes](https://github.com/lucide-icons/lucide/releases) - [Commits](https://github.com/lucide-icons/lucide/commits/1.6.0/packages/lucide-react) Updates `recharts` from 3.7.0 to 3.8.0 - [Release notes](https://github.com/recharts/recharts/releases) - [Changelog](https://github.com/recharts/recharts/blob/main/CHANGELOG.md) - [Commits](https://github.com/recharts/recharts/compare/v3.7.0...v3.8.0) Updates `shadcn` from 3.8.5 to 4.1.0 - [Release notes](https://github.com/shadcn-ui/ui/releases) - [Changelog](https://github.com/shadcn-ui/ui/blob/main/packages/shadcn/CHANGELOG.md) - [Commits](https://github.com/shadcn-ui/ui/commits/shadcn@4.1.0/packages/shadcn) Updates `streamdown` from 2.3.0 to 2.5.0 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/streamdown@2.5.0/packages/streamdown) Updates `@biomejs/biome` from 1.9.4 to 2.4.8 - [Release notes](https://github.com/biomejs/biome/releases) - [Changelog](https://github.com/biomejs/biome/blob/main/packages/@biomejs/biome/CHANGELOG.md) - [Commits](https://github.com/biomejs/biome/commits/@biomejs/biome@2.4.8/packages/@biomejs/biome) Updates `@eslint/js` from 9.39.4 to 10.0.1 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/commits/v10.0.1/packages/js) Updates `@types/node` from 24.12.0 to 25.5.0 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) Updates `eslint` from 9.39.4 to 10.1.0 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/compare/v9.39.4...v10.1.0) Updates `eslint-plugin-react-refresh` from 0.4.26 to 0.5.2 - [Release notes](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/releases) - [Changelog](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/blob/main/CHANGELOG.md) - [Commits](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/compare/v0.4.26...v0.5.2) Updates `globals` from 16.5.0 to 17.4.0 - [Release notes](https://github.com/sindresorhus/globals/releases) - [Commits](https://github.com/sindresorhus/globals/compare/v16.5.0...v17.4.0) Updates `typescript` from 5.9.3 to 6.0.2 - [Release notes](https://github.com/microsoft/TypeScript/releases) - [Commits](https://github.com/microsoft/TypeScript/compare/v5.9.3...v6.0.2) --- updated-dependencies: - dependency-name: "@dagrejs/dagre" dependency-version: 3.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@dagrejs/graphlib" dependency-version: 4.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@hugeicons/core-free-icons" dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@streamdown/cjk" dependency-version: 1.0.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: bun-frontend - dependency-name: "@streamdown/code" dependency-version: 1.1.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: lucide-react dependency-version: 1.6.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: recharts dependency-version: 3.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: shadcn dependency-version: 4.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: streamdown dependency-version: 2.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: "@biomejs/biome" dependency-version: 2.4.8 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@eslint/js" dependency-version: 10.0.1 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@types/node" dependency-version: 25.5.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint dependency-version: 10.1.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint-plugin-react-refresh dependency-version: 0.5.2 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: globals dependency-version: 17.4.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: typescript dependency-version: 6.0.2 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend ... Signed-off-by: dependabot[bot] <support@github.com> * Revert dagrejs upgrades Keep @dagrejs/dagre at ^2.0.4 and @dagrejs/graphlib at ^3.0.4. * Revert biome, eslint, typescript, and recharts upgrades These upgrades break studio/frontend locally: - @biomejs/biome 2.4.10 fails to parse the existing biome.json (files.ignore and organizeImports keys removed in v2; schema version mismatch). - typescript 6.0.2 emits TS5101 on tsconfig.app.json baseUrl ("Option 'baseUrl' is deprecated and will stop functioning in TypeScript 7.0"), so tsc -b exits 2. - eslint 10.2.0 conflicts with eslint-plugin-react-hooks@7.0.1, which peers on eslint ^9; npm install fails with ERESOLVE. - recharts 3.8.1 widened LegendPayload.dataKey to include a function type, which breaks the React key={item.dataKey} usage in src/components/ui/chart.tsx (TS2322). Hold these at their current pinned versions until the upstream peer deps and config migrations are ready. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com> |
||
|
|
8e977445d4
|
Let recipes use the model loaded in Chat (#4840)
* feat: inject local model provider into recipe jobs via JWT * feat: auto-generate JWT for local model providers in recipes * feat: add is_local flag to model provider config types and utils * fix(studio): skip endpoint validation for local providers * feat(studio): add local/external model source toggle to provider dialog * feat(studio): thread localProviderNames through model config dialog chain * feat(studio): show 'Local model (Chat)' label for local model_provider configs * fix: hardcode loopback for local endpoint, clear stale creds on toggle * fix: document TOCTOU/JWT rotation, add deferred import comments, fix is_local serialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): clear stale local model state on provider toggle and validation * fix(studio): override empty local endpoint in validation and skip model gate for unused providers * fix(studio): resolve loopback port from app.state, clear stale local provider fields, sync model id on toggle Address review feedback on the local-model-provider flow: - Backend (jobs.py): _resolve_local_v1_endpoint now reads the actual bound port from app.state.server_port (set in run.py after binding) instead of parsing it out of request.base_url, which is wrong behind any reverse proxy or non-default port. The two duplicated urlparse blocks are gone. - Backend (jobs.py): defensively pop api_key_env, extra_headers, extra_body from local providers so a previously external provider that flipped to local cannot leak invalid JSON or rogue auth headers into the local /v1 call. Also dedupe the post-loop assignment and tighten the local-name intersection so empty names cannot match. - Backend (jobs.py): hoist datetime and urllib.parse imports to the top import block for consistency with the rest of the file. - Backend (run.py): expose the bound port on app.state.server_port after the uvicorn server is constructed. - Frontend (model-provider-dialog.tsx): clear extra_headers and extra_body when toggling to local mode. Hidden inputs would otherwise keep stale JSON blocking validate/run. - Frontend (model-config-dialog.tsx): factor the local-aware provider selection logic into applyProviderChange and call it from both onValueChange and onBlur, so manually typing a provider name and tabbing away keeps the model field consistent. - Frontend (recipe-studio.ts store): handle both directions of the is_local toggle in the cascade. external -> local now backfills model: "local" on already-linked model_configs so they pass validation immediately, mirroring the existing local -> external clear path. - Frontend (validate.ts + build-payload.ts): thread localProviderNames into validateModelConfigProviders and skip the "model is required" check for local-linked configs. Local providers do not need a real model id since the inference endpoint uses the loaded Chat model. * fix(studio): narrow store cascade types, sync model placeholder on graph relink and node removal, harden ephemeral port path Loop 2 review fixes: - recipe-studio.ts: type-narrow next.is_local by also checking next.kind === "model_provider". TS otherwise raised TS2339 because next was typed as the union NodeConfig after the spread. The behavior is unchanged but the code now compiles cleanly. - model-config-dialog.tsx: convert the lastProviderRef / providerInputRef ref-during-render pattern (pre-existing react-hooks/refs lint error) to a useEffect that syncs providerInputRef from config.provider. The combobox blur path still uses applyProviderChange and remains stable. - recipe-graph-connection.ts: when a graph drag links a model_provider to a model_config, mirror the dialog applyProviderChange behavior: fill model: "local" if the new provider is local and the model field is blank, clear model when relinking from a local placeholder to an external provider, otherwise leave the model alone. - reference-sync.ts: when a referenced provider node is removed, clear the synthetic model: "local" placeholder along with the provider field, so a future relink to an external provider does not pass validation with a stale value that fails at runtime. - run.py: only publish app.state.server_port when the bound port is a real positive integer; for ephemeral binds (port==0) leave it unset and let request handlers fall back to request.base_url. - jobs.py: _resolve_local_v1_endpoint also falls back when app.state.server_port is non-positive, and uses `is None` instead of the truthy fallback so a literal 0 is handled correctly. * fix(studio): strict is_local check, narrow loaded-model gate to LLM-reachable configs, add scope-server port fallback Loop 3 review fixes: - jobs.py, validate.py: require `is_local is True` instead of truthy check. Malformed payloads such as is_local: "false" or is_local: 1 would otherwise be treated as local and silently rewritten to the loopback endpoint. - jobs.py: _resolve_local_v1_endpoint now tries request.scope["server"] (the actual uvicorn-assigned (host, port) tuple) as a second resolution step before falling back to parsing request.base_url. This covers direct-uvicorn startup paths and ephemeral binds that never publish app.state.server_port. - jobs.py: new _used_llm_model_aliases helper collects the set of model_aliases that an LLM column actually references, and the "Chat model loaded" gate is now only triggered when a local provider is reachable from that set. Orphan model_config nodes on the canvas no longer block unrelated recipe runs. * fix(studio): force skip_health_check on local-linked configs, skip JSON parsing for local providers, local-aware inline editor Loop 4 review fixes: - jobs.py: after rewriting local providers, also force skip_health_check: true on any model_config linked to a local provider. The /v1/models endpoint only advertises the real loaded model id, so data_designer's default model-availability health check would otherwise fail against the placeholder "local" id before the first chat completion call. The inference route already ignores the model id in chat completions, so skipping the check is safe. - builders-model.ts: buildModelProvider now short-circuits for local providers and emits only { name, endpoint: "", provider_type, is_local } without running parseJsonObject on the hidden extra_headers/extra_body inputs. Imported or hydrated recipes with stale invalid JSON in those fields no longer block client-side validate/run. - inline-model.tsx: the model_config branch now accepts an optional localProviderNames prop and mirrors the dialog applyProviderChange behavior. Changing provider to/from a local one auto-fills or clears the "local" placeholder consistently with the other edit paths. - recipe-graph-node.tsx: derive localProviderNames from the store via useMemo (stable identity) and pass it through renderNodeBody to <InlineModel>. Hooks order is preserved by declaring them above the early return for markdown_note nodes. - run.py: minor comment tweak - loop 3 already added the scope-server fallback path, note that in the comment. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <info@unsloth.ai> |
||
|
|
67e9db4921
|
build(deps): bump oxc-parser (#4776)
Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.121.0 to 0.123.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.123.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> |