unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

Author	SHA1	Message	Date
Roland Tannous	21e9a91a57	Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122 ) * Studio: forward standard OpenAI tools / tool_choice on /v1/responses Mirrors the /v1/chat/completions client-side tool pass-through from #5099 so clients (OpenAI Codex CLI, OpenAI Python SDK, ...) that target the Responses API receive structured function_call output items instead of plain text with tool-call tokens leaking into content. - ResponsesRequest: type tools/tool_choice properly, add parallel_tool_calls; accept function_call and function_call_output input items for multi-turn - Translate flat Responses tool / tool_choice shape to the nested Chat Completions shape before forwarding to llama-server - _normalise_responses_input: map function_call_output -> role="tool", function_call -> assistant tool_calls (preserving call_id) - Non-streaming: map returned tool_calls -> top-level function_call output items keyed by call_id - Streaming: emit response.output_item.added (function_call), response.function_call_arguments.delta/.done, and response.output_item.done per tool call while keeping the text message at output_index 0 - Pytest coverage: tools/tool_choice translation, multi-turn input mapping, non-streaming tool_calls mapping, response round-trip * Studio: merge system messages and close inner stream on /v1/responses Fixes two issues surfacing when OpenAI Codex CLI drives /v1/responses against a GGUF with a strict chat template (gpt-oss harmony, Qwen3, ...). 1. "System message must be at the beginning" upstream errors Codex sends `instructions` AND a `role:"developer"` message in `input`, producing two separate system-role messages. Strict templates raise when a second system message exists or when one appears after a user turn. _normalise_responses_input now hoists all instructions / system / developer content into a single merged system message at the top of the Chat Completions message list. 2. "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" _responses_stream consumed the inner chat-completions body_iterator without an explicit aclose() in a finally block. On client disconnect (Codex frequently cancels mid-stream), Python 3.13 finalized the inner async generator on a different task, tripping anyio's cancel-scope check. Mirrored the same try/finally + aclose pattern used by the /v1/messages, /v1/chat/completions, and /v1/completions passthroughs. Tests: hoisting of instructions + developer, developer mid-conversation, multiple system messages in input, no-system passthrough. * Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses Two issues observed driving /v1/responses from OpenAI Codex CLI against a GGUF backend. 1. 422 on every turn after the first Codex replays prior assistant turns with `content:[{"type":"output_text","text":...,"annotations":[],"logprobs":[]}]` and carries forward `reasoning` items (o-series / gpt-5) between turns. Our `ResponsesContentPart` union only accepted input_text / input_image, and `ResponsesInputItem` only message / function_call / function_call_output, so Pydantic failed the whole list and FastAPI returned `"Input should be a valid string"` against the `str` branch of the outer union. - Add `ResponsesOutputTextPart` for assistant-replay content. - Add `ResponsesUnknownContentPart` and `ResponsesUnknownInputItem` as permissive catch-alls (drop during normalisation). - Wire an explicit `Discriminator` so dispatch is deterministic and the fallthrough reaches the catch-all instead of misreporting via the outer `Union[str, list[...]]`. - `_normalise_responses_input` now accepts output_text parts, flattens single-part assistant text to a plain string (keeps legacy chat templates happy), and silently drops reasoning / unknown items. 2. "async generator ignored GeneratorExit" / cross-task cancel scope `_responses_stream` awaited `openai_chat_completions` in the parent route-handler task, which opens the httpx client for the inner passthrough on that task. The outer `StreamingResponse` then iterates in a child task, so the asyncgen GC finalises the inner httpcore byte stream on the child task, tripping anyio's "Attempted to exit cancel scope in a different task". Move the `await` inside `event_generator` so the httpx lifecycle stays within the single streaming child task, and surface any HTTPException as a `response.failed` SSE frame. Tests: assistant output_text replay, reasoning-item tolerance, unknown content-part tolerance, end-to-end Codex-shape payload (developer + user + reasoning + function_call + function_call_output + assistant output_text + user), and single-part assistant flattening to plain string. * Studio: call llama-server directly from streaming /v1/responses The previous fix (running the inner await inside event_generator) was not enough. Wrapping the existing `openai_chat_completions` pass-through still stacks two async generators: when the outer generator is closed, the innermost `HTTP11ConnectionByteStream.__aiter__` in httpcore doesn't receive GeneratorExit before Python's asyncgen GC finalises it in a sibling task, tripping "Attempted to exit cancel scope in a different task" and "async generator ignored GeneratorExit" — the same Python 3.13 + httpcore 1.0.x interaction already seen in PRs #4956, #4981, #5099. Cure both pass-throughs had: a single same-task httpx lifecycle with explicit `aiter_lines().aclose()` BEFORE `resp.aclose()` / `client.aclose()` in the generator's finally block. Apply it at the Responses layer by dropping the wrapper entirely for GGUF: open httpx, consume `resp.aiter_lines()`, parse `chat.completion.chunk`, emit Responses SSE events, close everything in finally — all in the single StreamingResponse child task. Non-GGUF streaming is rejected with a 400 (wrapping the transformers backend would re-introduce the double-layer pattern and isn't a Codex-compatible path today anyway). Also surfaces upstream httpx.RequestError / non-200 as a `response.failed` SSE frame rather than a dropped stream now that the request is dispatched after SSE headers have gone out. * Studio: silence benign httpcore asyncgen GC warnings on Python 3.13 The streaming pass-throughs (/v1/chat/completions, /v1/messages, /v1/responses, /v1/completions) all use the proven #4981 / #5099 pattern — single-task httpx lifecycle with explicit aiter_lines().aclose() ahead of resp.aclose() / client.aclose() in the generator's finally block. That handles our own iterators correctly. The residual noise ("async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task") comes from an innermost HTTP11ConnectionByteStream.__aiter__ that httpcore creates internally inside its pool. We hold no reference to it, so we cannot aclose it ourselves. Python 3.13's asyncgen GC hook finalises it on the finaliser task, its aclose path enters an anyio CancelScope shield, and Python flags the cross-task exit. The response has already been delivered with a 200 by then — it is purely log noise, not a functional failure. Same interaction seen in modelcontextprotocol/python-sdk #831, agno #3556, chainlit #2361, langchain-mcp-adapters #254. Install a targeted sys.unraisablehook that swallows this specific tuple — RuntimeError mentioning "cancel scope" or "GeneratorExit" plus an object repr referencing HTTP11ConnectionByteStream — and defers to the default hook for every other unraisable. Idempotent; guarded by a sentinel attribute so repeated imports don't stack filters.	2026-04-21 13:17:20 +04:00
Lee Jackson	c20959dbf4	Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX (#5089 ) * Chatbox, scroll, and menu fixes - Fixed chatbox auto-expand height for multi-line text on the compare page - Fixed chatbox UI to be consistent across compare and new chat - Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button - Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly - Added shutdown studio button to the menu for easier access - Fixed pop-up menu width to match the user button width (cherry picked from commit cd4e390dfa84fe311fae79a781b96cc0ef5970a9) * fix: correct compare scroll viewport and clean up chat composer UI polish * Dark theme refactor and sidebar/chat UI refinements - Complete refactoring of dark theme - Replaced square rounded-corner user profile image with a circular bordered one - Replaced user profile icon with 'U' initial and renamed label from 'Studio' to 'User' - Chat bubbles now have a pointy top-right edge - Sidebar menu tab line color selection is now consistent across all menus - Tab-selection color animation now also applies to recent chats - Removed 'Compare' menu autoselect when a compare chat conversation is selected - Fixed UI consistency in Compare to match New Chat - Removed sidebar animation and tab line, replaced with rounded selection for consistency - Further adjustments to sidebar UI - Further adjustments to compare chat UI * Fixed sidebar collapse/expand for recent chats and recent runs not being clickable * Chatbox, scroll, and menu fixes - Fixed chatbox auto-expand height for multi-line text on the compare page - Fixed chatbox UI to be consistent across compare and new chat - Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button - Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly - Added shutdown studio button to the menu for easier access - Fixed pop-up menu width to match the user button width * Sidebar, fonts, and chat UI refinements - Replaced logo PNG with real font text for 'unsloth' and 'BETA' label - Added Hellix font and applied it across menus and UI elements - Lighter scrollbar in the sidebar compared to other areas of the app - Adjusted chat font and chat bubble styling - Adjusted app menu design to stay consistent with the sidebar - Adjusted text style for 'New Chat' and repositioned content/chatbox - Adjusted model selector and top area UI - Fixed footer text from 'LLM's' to 'LLMs' - Fixed active selection border color incorrectly appearing on page refresh and during general navigation - Logo now defaults to 'New Chat' when clicked * Sidebar, model selector, and mobile UI fixes - Further adjustments to sidebar UI and logo - Changed right bar icon - Model selector adjustments - Collapsed sidebar now matches the content area background - Adjusted Hellix font spacing across pages - Fixed sidebar icon overlap on mobile screens * Adjust sidebar icons * Adjust sidebar icons * Fixed compare chat UI and scrolling issues * Fixed inference settings icon behavior and context info positioning - Fixed top right inference settings icon to move into sidepanel during expand/collapse, matching left sidebar behavior - Adjusted context information element positioning * Fix: textarea overflow in system prompt editor * Code block redesign, font, and chat bubble adjustments - Redesigned code block colors and theme - Changed code block font to Fira Code - Fixed scrollbar disappearing when expanding/collapsing tool calls in chats - Adjusted chat bubble background color * Fix chat bubble background color in dark theme * fix: restore textarea auto-sizing and scope prompt editor sizing * fix: add explicit textarea field sizing for prompt editor overflow * fix: generate chat nonce on click instead of render * fix: respect training lock on logo navigation * Refactor compare page dual chat scrolling behavior * Revert "Refactor compare page dual chat scrolling behavior" This reverts commit `d056ec09f2`. --------- Co-authored-by: sneakr <hauzin@hotmail.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-21 02:20:45 +04:00
Konstantin Azizov	0a5c61ffcc	fix: prefer mainstream clipboard copy over deprecated one (#5109 ) Fixes #5097 Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-20 23:18:18 +04:00
Lee Jackson	d3215ce113	Studio: Show LoRA live logs and update GGUF quant options (#5058 ) * export: update GGUF quant list and ordering * gguf: add Q2_K_L quantize flags for output and embeddings * export: add live console logs for LoRA export flow * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: stream q2_k_l quantize logs and include subprocess error details * fix: route Q2_K_L preset to q2_k ftype with q8_0 output+embeddings --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-20 23:14:49 +04:00
Lee Jackson	9c8a079d97	Studio: Local profile customization in settings and sync sidebar identity (#5088 ) * studio: add local profile customization in settings * studio: add local profile settings and sync sidebar identity * fix: adjust profile card margin * fix: move helper modules to utils and use single-letter avatar fallback * fix: keep profile icon visible on sidebar collapse * fix: sidebar account trigger labeling and profile reset prefs	2026-04-20 22:28:02 +04:00
Roland Tannous	9954781d30	fix(studio/chat): cancel in-flight run when trashing a thread from sidebar (#5067 ) Trashing a thread mid-stream used to delete the Dexie rows while the model kept generating, because the sidebar has no access to the @assistant-ui aui context. Expose per-thread cancelRun() through the chat runtime store and call it from deleteChatItem so trash behaves like Stop → Trash. Covers compare pairs by cancelling each paired thread. Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-20 21:06:59 +04:00
Roland Tannous	ac2daf8b7a	Studio: forward standard OpenAI tools / tool_choice to llama-server (#5099 ) * fix(studio): forward OpenAI tools/tool_choice to llama-server (#4999) Studio's /v1/chat/completions silently stripped standard OpenAI `tools` and `tool_choice` fields, so clients using standard function calling (opencode, Claude Code, Cursor, Continue, ...) never got structured tool_calls back. Adds a client-side pass-through path mirroring the existing Anthropic /v1/messages flow: when `tools` is present without Studio's `enable_tools` shorthand, the request is forwarded to llama-server verbatim so the client sees native id, finish_reason ("tool_calls"), delta.tool_calls, and accurate usage tokens. Also wires Anthropic tool_choice forwarding: /v1/messages previously accepted tool_choice on the request model but silently dropped it with a warning. Translate the four Anthropic shapes to OpenAI format and forward them so agentic clients can actually enforce tool use. - ChatCompletionRequest: add tools, tool_choice, stop; extra="allow" - ChatMessage: accept role="tool", optional tool_call_id / tool_calls / name; content is now optional (assistant with only tool_calls) - routes/inference.py: _openai_passthrough_stream / _openai_passthrough_non_streaming helpers, routing branch in openai_chat_completions, vision+tools via content-parts injection - _build_passthrough_payload: tool_choice parameter (default "auto") - anthropic_compat: anthropic_tool_choice_to_openai() translator - tests/test_openai_tool_passthrough.py: Pydantic + translator unit tests - tests/test_studio_api.py: 5 new E2E tests (non-stream, stream, multi-turn, OpenAI SDK, Anthropic tool_choice=any regression) * fix(studio): surface httpx transport errors from OpenAI passthrough When the managed llama-server subprocess crashes mid-request, the async pass-through helpers in routes/inference.py used to return a bare 500 (non-streaming) or an "An internal error occurred" SSE chunk (streaming) because _friendly_error only recognized the sync path's "Lost connection to llama-server" substring -- httpx transport failures (ConnectError / ReadError / RemoteProtocolError / ReadTimeout) stringify differently and fell through to the generic case. - _friendly_error: map any httpx.RequestError subclass to the same "Lost connection to the model server" message the sync chat path emits. Placed before the substring heuristics so the streaming path automatically picks it up via its existing except Exception catch. - _openai_passthrough_non_streaming: wrap the httpx.AsyncClient.post in a try/except httpx.RequestError and re-raise as HTTPException 502 with the friendly detail. - tests/test_openai_tool_passthrough.py: new TestFriendlyErrorHttpx class pinning the mapping for ConnectError, ReadError, RemoteProtocolError, ReadTimeout, and confirming non-httpx paths (context-size heuristic, generic fallback) are unchanged. * fix(studio): close aiter_bytes/aiter_lines explicitly in passthroughs The httpcore asyncgen cleanup fix in `5cedd9a5` is incomplete on Python 3.13 + httpcore 1.0.x: it switched to manual client/response lifecycle but still used anonymous `async for raw_line in resp.aiter_lines():` patterns in all three streaming paths. Python's async for does NOT auto-close the iterator on break/return, so the aiter_lines / aiter_bytes async generator remains alive, reachable only from the surrounding coroutine frame. Once `_stream()` returns the frame is GC'd and the orphaned asyncgen is finalized on a LATER GC pass in a DIFFERENT asyncio task, where httpcore's HTTP11ConnectionByteStream.aclose() enters anyio.CancelScope.__exit__ with a mismatched task and prints "Exception ignored in: <async generator>" / "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" to the server log. User observed this on /v1/messages after successful (status 200) requests, with the traceback pointing at HTTP11ConnectionByteStream .__aiter__ / .aclose inside httpcore. Fix: save resp.aiter_lines() / resp.aiter_bytes() as a variable and explicitly `await iter.aclose()` in the finally block BEFORE resp.aclose() / client.aclose(). This closes the asyncgen inside the current task's event loop, so the internal httpcore byte stream is cleaned up before Python's asyncgen GC hook has anything orphaned to finalize. Each aclose is wrapped in try/except Exception so nested anyio cleanup noise can't bubble out. Applied to all three streaming passthrough paths: - _anthropic_passthrough_stream (/v1/messages client-side tool path) - _openai_passthrough_stream (/v1/chat/completions client-side tool path, new in this PR) - openai_completions (/v1/completions bytes proxy from PR #4956) * fix(studio): default ChatCompletionRequest.stream to false per OpenAI spec OpenAI's /v1/chat/completions spec defaults `stream` to false, so clients that omit the field (naive curl, minimal integrations) expect a single JSON response back. Studio was defaulting to true, silently switching those clients into SSE and breaking any parser that didn't also handle streaming. ResponsesRequest and AnthropicMessagesRequest already default to false correctly; only ChatCompletionRequest was wrong. Studio's own frontend always sets `stream` explicitly on every chat-adapter / chat-api / runtime-provider call site, so the flip has no UI impact. SDK users (OpenAI Python/JS SDK, opencode, Claude Code, Cursor, Continue) also always pass `stream` explicitly, so they're unaffected. The only clients feeling the change are raw-curl users who were relying on the wrong default -- those get the correct OpenAI behavior now. Added a regression test pinning the default so it can't silently flip back. * fix(studio): reject images in OpenAI tool passthrough for text-only GGUFs The new tool passthrough branch runs before _extract_content_parts, skipping the existing not is_vision guard. Requests combining tools with an image on a text-only tool-capable GGUF were forwarded to llama-server, producing opaque upstream errors instead of the pre-existing clear 400. Restore the guard inline at the dispatch point, checking both legacy image_base64 and inline image_url parts. * fix(studio): require tool_call_id on role=tool chat messages Enforce the OpenAI spec rule that role="tool" messages must carry a tool_call_id. Without it, upstream backends cannot associate a tool result with the assistant's prior tool_calls entry and the request fails in non-obvious ways through the passthrough path. Reject at the request boundary with a 422 instead. * fix(studio): harden OpenAI tool passthrough validation and error surfacing Three related fixes called out by the PR review: 1. Preserve upstream status codes in the streaming passthrough. The httpx request is now dispatched before the StreamingResponse is constructed. Non-200 upstream responses and httpx RequestError transport failures raise HTTPException with the real status instead of being buried inside a 200 SSE error frame, so OpenAI SDK clients see APIError/BadRequestError/... as expected. 2. Require non-empty content on user/system/tool messages. Per the OpenAI spec, content may only be omitted on assistant messages that carry tool_calls; enforce that at the request boundary so malformed messages never reach the passthrough path. 3. Role-constrain tool-call metadata. tool_calls is only valid on role=assistant, tool_call_id and name only on role=tool. Without this, a user/system message with tool_calls would flip the passthrough branch on and be forwarded to llama-server, surfacing as an opaque upstream error. * fix(studio): normalize image mode and passthrough JSON verbatim Two Gemini-code-assist review findings on PR #5099: 1. Unconditionally convert decoded images to RGB before PNG encoding. The prior code only handled RGBA, letting CMYK/I/F images crash at img.save(format="PNG") and surface as opaque 400s. Applied to both the passthrough helper and the non-passthrough GGUF path that originally carried this pattern, keeping the two sites in sync. 2. Return the upstream JSON body as raw bytes via Response rather than parse-then-re-serialize with JSONResponse. Matches the passthrough helper's "verbatim" contract and drops a redundant round-trip. --------- Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-18 12:53:23 +04:00
Daniel Han	0b57884120	Add Qwen3.6 inference defaults for Studio (#5065 ) * Add Qwen3.6 inference defaults for Studio Add qwen3.6 family entry to inference_defaults.json with the recommended sampling parameters from Qwen's documentation: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0. Without this, Qwen3.6 models fall through to the generic qwen3 pattern which uses different defaults (temperature=0.6, top_p=0.95, no presence_penalty). * Add Qwen3.6-35B-A3B-GGUF to default model lists * Add Qwen3.5/3.6 presence_penalty to thinking toggle and small-model disable logic - Thinking toggle (on-load + button click) now sets presencePenalty: 1.5 for Qwen3.5 and Qwen3.6 models (both thinking-ON and thinking-OFF states) - Small-model thinking-disable check (<9B defaults to no-thinking) extended from Qwen3.5-only to also cover Qwen3.6, in all 3 locations: frontend on-load, frontend refresh, backend llama_cpp.py	2026-04-16 11:42:42 -07:00
Lee Jackson	ee86530e55	chore: switch helper and no-cache fallback to Gemma (#5066 )	2026-04-16 22:27:30 +04:00
Wasim Yousef Said	bc9ddb3af6	Fix onboarding followups (#5064 ) * Fix onboarding followups * Rename sidebar studio to train	2026-04-16 10:11:35 -07:00
Wasim Yousef Said	7ef65bd2e5	Chat first onboarding (#5063 ) * auth: default to chat * settings: relaunch onboarding * onboarding: return to launch page * studio: stop auto guided tour * ui: soften global radius * cleanup: rename onboarding exit prop * fix onboarding redirect safety * Show real Unsloth version in settings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 09:58:10 -07:00
हिमांशु	f4422b0a62	change torchcodec version to 0.10.0 in extra-no-deps (#5043 ) Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-16 19:50:57 +04:00
Wasim Yousef Said	b01e9af124	feat(studio): replace navbar with collapsible sidebar (#4936 ) * feat(studio): replace navbar navigation with collapsible sidebar Add an app-wide sidebar with hover-expand and pin-to-dock behavior. Navigation items (Studio, Recipes, Export, Chat) move from the center pill navbar to the sidebar. Chat threads and recipes render as collapsible sub-lists. Navbar simplified to logo + update + close. - Extend SidebarProvider with pinned/hovered state model - New AppSidebar with animated active indicator, sloth profile menu, theme toggle, guided tour, back/forward navigation - Chat page refactored to URL-driven view state via search params - Extract reusable hooks for chat thread and recipe sidebar data - Guard startViewTransition for browser compatibility - Wrap chat deletions in Dexie transaction for data integrity * feat(studio): move logo to sidebar and make navbar overlay - Sidebar is now full-height with logo in SidebarHeader - Collapsed sidebar shows sticker.png, expanded shows full logo - Navbar is absolute-positioned overlay (no layout space) - Main content extends to top, aligning with navbar controls * feat(studio): full-height sidebar with recents, edge-to-edge nav buttons - Sidebar outside max-w-7xl, pinned to left edge - Remove sidebar rounding, menu buttons rounded-md - Nav buttons flush to sidebar edges with no left rounding - Replace collapsible recipes/chat with flat nav items - Add Recents section with chat history (1 item when not on chat, full on chat) - New Chat as first nav item with PencilEdit02Icon - Cursor pointer on all sidebar buttons - Navbar temporarily hidden for screenshots * fix(studio): fix chat scroll, action bar hover, collapsible recents - Fix sticky composer by removing `relative` override on viewport footer - Action bar buttons only show on hover (autohide=always) - Remove floating border/shadow from action bar - Add scroll space above composer for last message actions - Back/forward buttons use router history (stay in-app) - Recents section collapsible with chevron on chat route - Set html/body/#root height for proper h-full chain * fix(studio): address review feedback, clean up unused code - Unhide navbar (was left hidden from screenshot) - Remove unused imports: SidebarMenuSub, BubbleChatIcon, ColumnInsertIcon - Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen - Include compare query id in active sidebar selection - Use store type for contextUsage instead of inline type - Simplify noop in sidebar.tsx - Remove empty className prop feat(studio): add mobile sidebar, recent runs section, and misc UX fixes * feat(studio): scaffold settings feature module with dialog store * feat(studio): add tri-state theme store for settings * feat(chat): add clear-all-chats and export-chat-history utils * feat(studio): add settings dialog shell with tab rail * feat(studio): add appearance tab with theme and sidebar pin * feat(studio): add settings general tab with hf token, auto-title, reset prefs * feat(studio): add settings chat tab with export and clear * feat(studio): add api keys tab with list and revoke flow * feat(studio): add create-key form and reveal dialog * feat(studio): add usage examples panel to api keys tab * feat(studio): add settings about tab with update and shutdown * feat(studio): add settings dropdown item and cmd-comma shortcut * feat(studio): remove legacy api-keys route and chat-sheet preference rows * fix(studio): settings dialog a11y + polish pass * feat(studio): inline api key reveal card replacing nested dialog * fix(studio): hide revoked keys from settings list * refactor(studio): strip navbar and hoist training unload guard * feat(studio): explicit sidebar toggle, remove hover-open and pin icons * fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle * fix(studio): address code review findings for settings dialog * feat(studio): collapsible navigate group with standalone new-chat and compare * fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare * fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible * feat(studio): add compact logo assets for sidebar header * Fixed sidebar design * fix(studio): sidebar delete icon hover contrast and sizing * feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio) * feat(studio): add chat search store * feat(studio): add chat search index hook with snapshot-on-open * feat(studio): add chat search command dialog with global shortcut * feat(studio): wire chat search into sidebar * fix(studio): trim hf token on save, add show/hide toggle, commit on close * revert(studio): restore original sidebar/border colors, brighten sidebar * feat(studio): forward overlayClassName through CommandDialog * fix(studio): wrap search dialog in Command context, redesign as flat 635px card * fix(studio): reserve right padding on recent items so delete icon stops overlapping title * fix(studio): skip hf token unmount-commit during reset-prefs reload * chore(studio): drop unused icon import and unreachable runs navigate branch * fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text * fix(studio): keep CommandEmpty in tree so empty state renders correctly * fix(studio): cap system prompt and chat template textareas so they scroll instead of growing * fix(studio): attach chat-compare tour anchor to sidebar compare button * fix(studio): persist system theme explicitly so next-themes does not clobber on reload * fix(studio): auto-switch to history tab when selecting a recent run from sidebar * UI overhaul: chatbox, scrollbar, sidebar, and compare view UI Changes: - Redesigned the Compare UI with general cleanup - Redesigned the Chatbox UI - Reduced the width of the user chat bubble for improved readability - Narrowed the user chat box across the content page - Adjusted thinking-box text color to be slightly darker - Removed faded text effect from chat messages - Removed faded text effect from the thinking box - Added a small LLM chat safety note at the bottom of the chatbox - Restyled the scrollbar Layout & Behavior: - Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover - Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout - Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior - Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling - Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox Fixes: - Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started) - Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side * Fix user message layout in thread component * swap code icon * fix compare layout * fix compare pane flex * Sidebar improvements and fixes - Added scrolling support to the sidebar so menus and recent chats no longer get hidden - Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export - Recent chat is now deselected when selecting other navigations - Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely - Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar - Reduced sidebar width on mobile to around 2/3 of the screen (was too wide) - Made the close-sidebar hover tooltip consistent with the rest of the design - Removed sidebar collapse/expand animation - Small adjustment to chat width * Fix route scrolling, polling, and theme sync issues * Fix Studio page scrolling --------- Co-authored-by: sneakr <hauzin@hotmail.com>	2026-04-16 08:46:16 -07:00
Daniel Han	05ec0f110b	Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050 ) * Studio: Ollama support, recommended folders, Custom Folders UX polish Backend: - Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/* and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing at the content-addressable blobs, so detect_gguf_model and llama-server -m work unchanged for Ollama models - Filter entries under .studio_links from the generic models/hf/lmstudio scanners to avoid duplicate rows and leaked internal paths in the UI - New GET /api/models/recommended-folders endpoint returning LM Studio and Ollama model directories that currently exist on the machine (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy LM Studio cache), used by the Custom Folders quick-add chips - detect_gguf_model now uses os.path.abspath instead of Path.resolve so the readable symlink name is preserved as display_name (e.g. qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...) - llama-server failure with a path under .studio_links or .cache/ollama surfaces a friendlier message ("Some Ollama models do not work with llama.cpp. Try a different model, or use this model directly through Ollama instead.") instead of the generic validation error Frontend: - ListLabel supports an optional leading icon and collapse toggle; used for Downloaded (download icon), Custom Folders (folder icon), and Recommended (star icon) - Custom Folders header gets folder icon on the left, and +, search, and chevron buttons on the right; chevron uses ml-auto so it aligns with the Downloaded and Recommended chevrons - New recommended folder chips render below the registered scan folders when there are unregistered well-known paths; one click adds them as a scan folder - Custom folder rows that are direct .gguf files (Ollama symlinks) load immediately via onSelect instead of opening the GGUF variant expander (which is for repos containing multiple quants, not single files) - When loading a direct .gguf file path, send max_seq_length = 0 so the backend uses the model's native context instead of the 4096 chat default (qwen2.5:0.5b now loads at 32768 instead of 4096) - New listRecommendedFolders() helper on the chat API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: log silent exceptions and support read-only Ollama dirs Replace silent except blocks in _scan_ollama_dir and the recommended-folders endpoint with narrower exception types plus debug or warning logs, so failures are diagnosable without hiding signal. Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links) when the Ollama models directory is read-only. Common for system installs at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models where the Studio process has read but not write access. Previously the scanner returned an empty list in that case and Ollama models would silently not appear. The fallback preserves the .gguf suffix on symlink names so detect_gguf_model keeps recognising them. The prior "raw sha256 blob path" fallback would have missed the suffix check and failed to load. * Address review: detect mmproj next to symlink target for vision GGUFs Codex P1 on model_config.py:1012: when detect_gguf_model returns the symlink path (to preserve readable display names), detect_mmproj_file searched the symlink's parent directory instead of the target's. For vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight file is symlinked but any mmproj sidecar lives next to the real blob -- mmproj was no longer detected, so the model was misclassified as text-only and llama-server would start without --mmproj. detect_mmproj_file now adds the resolved target's parent to the scan order when path is a symlink. Direct (non-symlink) .gguf paths are unchanged, so LM Studio and HF cache layouts keep working exactly as before. Verified with a fake layout reproducing the bug plus a regression check on a non-symlink LM Studio model. * Address review: support all Ollama namespaces and vision projector layers - Iterate over all directories under registry.ollama.ai/ instead of hardcoding the "library" namespace. Custom namespaces like "mradermacher/llama3" now get scanned and include the namespace prefix in display names, model IDs, and symlink names to avoid collisions. - Create companion -mmproj.gguf symlinks for Ollama vision models that have an "application/vnd.ollama.image.projector" layer, so detect_mmproj_file can find the projector alongside the model. - Extract symlink creation into _make_symlink helper to reduce duplication between model and projector paths. * Address review: move imports to top level and add scan limit - Move hashlib and json imports to the top of the file (PEP 8). - Remove inline `import json as _json` and `import hashlib` from function bodies, use the top-level imports directly. - Add `limit` parameter to `_scan_ollama_dir()` with early exit when the threshold is reached. - Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops traversing once enough models are found. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: Windows fallback, all registry hosts, collision safety _make_link (formerly _make_symlink): - Falls back to os.link() hardlink when symlink_to() fails (Windows without Developer Mode), then to shutil.copy2 as last resort - Uses atomic os.replace via tmp file to avoid race window where the .gguf path is missing during rescan Scanner now handles all Ollama registry layouts: - Uses rglob over manifests/ instead of hardcoding registry.ollama.ai - Discovers hf.co/org/repo:tag and any other host, not just library/ - Filenames include a stable sha1 hash of the manifest path to prevent collisions between models that normalize to the same stem Per-model subdirectories under .studio_links/: - Each model's links live in their own hash-keyed subdirectory - detect_mmproj_file only sees the projector for that specific model, not siblings from other Ollama models Friendly Ollama error detection: - Now also matches ollama_links/ (the read-only fallback cache path) and model_identifier starting with "ollama/" Recommended folders: - Added os.access(R_OK \| X_OK) check so unreadable system directories like /var/lib/ollama/.ollama/models are not advertised as chips * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: filter ollama_links from generic scanners The generic scanners (models_dir, hf_cache, lmstudio) already filter out .studio_links to avoid duplicate Ollama entries, but missed the ollama_links fallback cache directory used for read-only Ollama installs. Add it to the filter. * Address review: idempotent link creation and path-component filter _make_link: - Skip recreation when a valid link/copy already exists (samefile or matching size check). Prevents blocking the model-list API with multi-GB copies on repeated scans. - Use uuid4 instead of os.getpid() for tmp file names to avoid race conditions from concurrent scans. - Log cleanup errors instead of silently swallowing them. Path filter: - Use os.sep-bounded checks instead of bare substring match to avoid false positives on paths like "my.studio_links.backup/model.gguf". * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: drop copy fallback, targeted glob, robust path filter _make_link: - Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync API request would block the backend. Log a warning and skip the model when both symlink and hardlink fail. Scanner: - Replace rglob("") with targeted glob patterns (// and ///) to avoid traversing unrelated subdirectories in large custom folders. Path filter: - Use Path.parts membership check instead of os.sep substring matching for robustness across platforms. Scan limit: - Skip _scan_ollama_dir when _generic already fills the per-folder cap. * Address review: sha256, top-level uuid import, Path.absolute() - Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency. - Move uuid import to the top of the file instead of inside _make_link. - Replace os.path.abspath with Path.absolute() in detect_gguf_model to match the pathlib style used throughout the codebase. * Address review: fix stale comments (sha1, rglob, copy fallback) Update three docstrings/comments that still referenced the old implementation after recent changes: - sha1 comment now says "not a security boundary" (no hash name) - "rglob" -> "targeted glob patterns" - "file copies as a last resort" -> removed (copy fallback was dropped) * Address review: fix stale links, support all manifest depths, scope error _make_link: - Drop size-based idempotency shortcut that kept stale links after ollama pull updates a tag to a same-sized blob. Only samefile() is used now -- if the link doesn't point at the exact same inode, it gets replaced. Scanner: - Revert targeted glob back to rglob so deeper OCI-style repo names (5+ path segments) are not silently skipped. Ollama error: - Only show "Some Ollama models do not work with llama.cpp" when the server output contains GGUF compatibility hints (key not found, unknown architecture, failed to load). Unrelated failures like OOM or missing binaries now show the generic error instead of being misdiagnosed. --------- Co-authored-by: Daniel Han <info@unsloth.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <michaelhan2050@gmail.com>	2026-04-16 08:24:08 -07:00
Daniel Han	6e87bade25	Trim verbose comments in PATH helpers Reduce inline comments from ~160 lines to ~25 across both files. Keep one-line summaries of the "why"; drop multi-paragraph rationale blocks that repeated information already captured in commit messages and PR discussion.	2026-04-16 12:01:01 +00:00
Etherll	ec32ce2e82	fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961 ) * fix: replacing SetEnvironmentVariable with direct registry API * apply reviews * Use CreateSubKey for HKCU\Environment * Store PATH backup under HKCU\Software\Unsloth * Fix $backupKey registry handle leak in PATH backup block Wrap $backupKey operations in try/finally so the handle is closed even if GetValue or SetValue throws. The Add-ToUserPath helper already uses this pattern for its registry key -- the backup block was the only place missing it. * Isolate WM_SETTINGCHANGE broadcast from PATH write error handling Wrap the broadcast dummy-variable calls in their own try/catch so a broadcast failure does not mask a successful registry PATH write. Previously, if SetEnvironmentVariable threw after SetValue already committed the new PATH, Add-ToUserPath would return $false and the caller would skip Refresh-SessionPath. * PATH helper polish: venv precedence, quoted entries, raw/expanded dedup Three small follow-ups surfaced by a 10-reviewer pass against the rebased PR head. None fix a regression vs main; each strictly improves the new helpers. Refresh-SessionPath / Refresh-Environment: - Move $env:Path to the front of the merge so an activated venv keeps precedence over machine/user PATH after a refresh. Pre-PR dropped process-only entries entirely; post-PR kept them but at the back. - Dedup on both raw and expanded forms so %USERPROFILE%\foo and the already-expanded C:\Users\me\foo do not both survive. Add-ToUserPath: - Trim whitespace and surrounding double-quotes from each compared entry so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate against an unquoted directory of the same path. * Back up User PATH inside Add-ToUserPath, before first mutation Previously only studio/setup.ps1 took a one-time PATH backup, at script top (line ~547). install.ps1 (the irm \| iex entry point) had no backup, so users who installed via that path had no recovery surface if anything clobbered their PATH. The PR description's "one-time backup before any modifications" promise only held for the studio installer flow. Move the backup into Add-ToUserPath itself: just before the first actual SetValue mutation, write the pristine raw PATH to HKCU\Software\Unsloth\PathBackup if no backup already exists. This: - Covers both entry points (install.ps1 and studio/setup.ps1). - Captures the TRUE pristine PATH even when install.ps1 runs first and studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1 would otherwise see an already-modified PATH). - Is idempotent: once a backup exists, subsequent calls preserve it. - Skips when nothing would mutate (dedup match) or PATH is empty. The script-top backup in studio/setup.ps1 is kept for defense in depth. * Refresh PATH: venv-aware merge order Reconcile two competing concerns about Refresh-SessionPath / Refresh-Environment surfaced by separate review rounds: - venv at the back -> activated venv loses precedence to system Python - process at the front -> stale shims (old node, old python, etc.) still on $env:Path can beat a freshly installed tool New merge order: 1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set 2. Machine PATH freshly read from registry 3. User PATH freshly read from registry 4. Current $env:Path as fallback This way an explicitly-activated venv keeps priority while a tool the script just installed wins over any stale entry that was already on the inherited shell PATH. When no venv is active, fresh registry entries take precedence as expected. * Append to User PATH by default, close $envKey in finally Add-ToUserPath gains a -Position Append\|Prepend parameter defaulting to Append so installing unsloth no longer prepends the bundled venv Scripts directory ahead of the user's existing python / pip on new shells. The four current call sites (install.ps1 launcher, studio/setup.ps1 CMake, nvcc, Python user Scripts) all take the Append default because each one that needs in-session precedence already does an inline $env:Path prepend independently. This matches rustup / cargo / nvm / pyenv / uv behavior. Also wrap the script-top $envKey.GetValue in a try/finally so the registry handle is released even if the read throws. Matches the pattern already used for $backupKey five lines below. * Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended The previous commit switched Add-ToUserPath to append by default so that installing unsloth would not silently hijack the user's system python / pip. That was correct for the venv Scripts dir (which contains python.exe and pip.exe alongside unsloth.exe), but wrong for the three studio/setup call sites. Those persist cmake, the driver-compatible nvcc, and the Python user Scripts dir for future shells, and in all three cases an older tool already earlier in the user PATH would keep winning after the install finished. The nvcc case is especially load-bearing: setup selects a driver-compatible CUDA toolkit, then llama.cpp builds against whatever wins PATH resolution, so a stale older nvcc produces broken builds. Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites (cmake at line 754, nvcc bin at line 1025, Python user Scripts at line 1191). None of those directories holds python.exe, so prepending them does not re-introduce the original hijack problem. Leave the install.ps1 venv Scripts call on the default Append with a comment explaining why. * Symmetric dedup, Prepend reorders duplicates, unsloth shim dir Address three separate findings surfaced by review: 1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded registry entries via ExpandEnvironmentVariables but did NOT expand the new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was already in PATH produced a duplicate. Expand both sides so the check is symmetric. 2. -Position Prepend no-op on existing duplicates: the dedup loop returned $false as soon as it saw a match, regardless of position. That left a late-position duplicate in place instead of moving it to the front, so "prepend the newly selected cmake/nvcc" did not always beat an older copy earlier in PATH. Partition entries into kept and dropped lists, then reinsert a single copy at the requested position. Append still returns $false on any match so user-curated orderings are not reshuffled. Prepend also returns $false when the only copy is already at position 0 so we preserve the user's casing. 3. Stop adding the venv Scripts dir to User PATH entirely. That dir holds python.exe and pip.exe alongside unsloth.exe, so neither Prepend nor Append worked: prepend hijacked the user's system python and pip, append made the freshly-installed unsloth.exe lose to any older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add with a dedicated shim directory that contains only unsloth.cmd, and prepend that dir. The shim calls the venv's unsloth.exe by absolute path so future pip upgrades inside the venv propagate automatically. * Shim via hardlink, Append user Scripts, drop venv sysconfig fallback Three follow-ups to the `c0ab1ab` shim commit, targeting concerns raised in the second 20-reviewer pass: 1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd. The batch-file approach had three distinct regressions: - cmd.exe expanded %...% sequences inside user arguments, so prompts like "What does 50% mean?" got mangled before reaching the CLI - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve bare-name lookups to .cmd files, so `unsloth` stopped working there - Set-Content -Encoding ASCII replaced non-ASCII profile characters with '?', so installs under C:\Users\Jörg\... wrote a broken shim A hardlink (fallback: copy) of unsloth.exe is a native Windows executable with no shell indirection. PATHEXT picks .exe before .cmd in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess callers hit it directly, and a hardlink stays in sync with the venv on pip upgrades because both names point at the same inode. 2. studio/setup.ps1 Python user Scripts dir is added with default Append instead of -Position Prepend. That directory holds every pip-installed user console script (pip, pytest, huggingface-cli, and so on), not just unsloth, so reordering it silently changed resolution order for unrelated tools. The new install.ps1 shim at PATH position 0 already guarantees `unsloth` resolves to the freshly installed copy, so the Python user Scripts entry only needs to be present, not at the front. 3. The sysconfig lookup in studio/setup.ps1 no longer falls back to sysconfig.get_path('scripts') when the nt_user scheme dir does not exist. When setup.ps1 is invoked from an activated venv (a flow the linked issue actually hits) that fallback returns the venv's Scripts directory, which would then be added to the persisted User PATH and re-introduce the python / pip hijack the shim dir is meant to avoid. Stick strictly to the nt_user scheme; skip the block if it does not exist on disk. * Do not crash installer when unsloth.exe shim is locked The shim update sequence at install.ps1:1095 did a bare Remove-Item / New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference a locked target (most commonly 'unsloth studio' still running while the user re-invokes the installer) turns the Remove-Item failure into a terminating error that aborts the install with no actionable message. The existing shim is perfectly usable in that state, so there is no reason to abort. Wrap the whole remove/link/copy sequence in a try/catch that logs the probable cause (Studio still running), points at the fix (close Studio and re-run), and lets the installer finish with the old launcher still serving the command. Also only emit the "added unsloth launcher to PATH" step line when the launcher was actually (re)created AND the PATH entry was newly added -- previously the message fired even when the shim refresh silently failed, which was confusing. * Guard shim PATH entry on existence, use NullString for broadcast delete Two follow-ups surfaced by the latest review pass: 1. Do not add the shim directory to User PATH when the launcher was not actually created. Antivirus blocking unsloth.exe, a disk-full volume, or restrictive filesystem permissions can make both the hardlink and the copy fallback fail on a fresh install. In that case the existing sequence would report "added unsloth launcher to PATH" warnings but still prepend the empty $ShimDir to User PATH -- the user sees an install that claims success but then cannot resolve `unsloth` in a new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH entry is only persisted when the launcher is really there. 2. Pass [NullString]::Value instead of $null to the broadcast-delete call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET 9), a bare $null going into [Environment]::SetEnvironmentVariable can be coerced to an empty string rather than a true .NET null, which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in HKCU\Environment instead of deleting it. The leaked variable is visible in System Properties and accumulates one entry per install run. [NullString]::Value is a PowerShell-specific sentinel that crosses the interop boundary as a real null and works on both PS 5.1 and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-16 04:49:51 -07:00
Daniel Han	f0d03655e8	Studio: add folder browser modal for Custom Folders (#5035 ) * Studio: add folder browser modal for Custom Folders The Custom Folders row in the model picker currently only accepts a typed path. On a remote-served Studio (Colab, shared workstation) that means the user has to guess or paste the exact server-side absolute path. A native browser folder picker can't solve this: HTML `<input type="file" webkitdirectory>` hides the absolute path for security, and the File System Access API (Chrome/Edge only) returns handles rather than strings, neither of which the server can act on. This PR adds a small in-app directory browser that lists paths on the server and hands the chosen string back to the existing `POST /api/models/scan-folders` flow. ## Backend * New endpoint `GET /api/models/browse-folders`: * `path` query param (expands `~`, accepts relative or absolute; empty defaults to the user's home directory). * `show_hidden` boolean to include dotfiles/dotdirs. * Returns `{current, parent, entries[], suggestions[]}`. `parent` is null at the filesystem root. * Immediate subdirectories only (no recursion); files are never returned. * `entries[].has_models` is a cheap hint: the directory looks like it holds models if it is named `models--` (HF hub cache layout) or one of the first 64 children is a .gguf/.safetensors/config.json/ adapter_config.json or another `models--` subfolder. * Sort order: model-bearing dirs, then plain, then hidden; case- insensitive alphabetical within each bucket. * Suggestions auto-populate from HOME, the HF cache root, and any already-registered scan folders, deduplicated. * Error surface: 404 for missing path, 400 for non-directory, 403 on permission errors. Auth-required like the other models routes. * New Pydantic schemas `BrowseEntry` and `BrowseFoldersResponse` in `studio/backend/models/models.py`. ## Frontend * New `FolderBrowser` component (`studio/frontend/src/components/assistant-ui/model-selector/folder-browser.tsx`) using the existing `Dialog` primitive. Features: * Clickable breadcrumb with a `..` row for parent navigation. * Quick-pick chips for the server-provided suggestions. * `Show hidden` checkbox. * In-flight fetch cancellation via AbortController so rapid navigation doesn't flash stale results. * Badges model-bearing directories inline. * `chat-api.ts` gains `browseFolders(path?, showHidden?)` and matching types. * `pickers.tsx` adds a folder-magnifier icon next to the existing `Add` button. Opening the browser seeds it with whatever the user has already typed; confirming fills the text input, leaving the existing validation and save flow unchanged. ## What it does NOT change * The existing text-input flow still works; the browser is additive. * No new permissions or escalation; the endpoint reads only directories the server process is already allowed to read. * No model scanning or filesystem mutation happens from the browser itself -- it just returns basenames for render. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: cap folder-browser entries and expose truncated flag Pointing the folder browser at a huge directory (``/usr/lib``, ``/proc``, or a synthetic tree with thousands of subfolders) previously walked the whole listing and stat-probed every child via ``_looks_like_model_dir``. That is both a DoS shape for the server process and a large-payload surprise for the client. Introduce a hard cap of 2000 subdirectory entries and a ``truncated: bool`` field on the response. The frontend renders a small hint below the list when it fires, prompting the user to narrow the path. Below-cap directories are unchanged. Verified end-to-end against the live backend with a synthetic tree of 2050 directories: response lands at 2000 entries, ``truncated=true``, listing finishes in sub-second time (versus tens of seconds if we were stat-storming). * Studio: suggest LM Studio / Ollama dirs + 2-level model probe Three improvements to the folder-browser, driven by actually dropping an LM Studio-style install (publisher/model/weights.gguf) into the sandbox and walking the UX: ## 1. Quick-pick chips for other local-LLM tools `well_known_model_dirs()` (new) returns paths commonly used by adjacent tools. Only paths that exist are returned so the UI never shows dead chips. * LM Studio current + legacy roots + user-configured `downloadsFolder` from its `settings.json` (reuses the existing `lmstudio_model_dirs()` helper). * Ollama: `$OLLAMA_MODELS` env override, then `~/.ollama/models`, `/usr/share/ollama/.ollama/models`, and `/var/lib/ollama/.ollama/models` (the systemd-service install path surfaced in the upstream "where is everything?" issue). * Generic user-choice locations: `~/models`, `~/Models`. Dedup is stable across all sources. ## 2. Two-level model-bearing probe LM Studio and Ollama both use `root/publisher/model/weights.gguf`. The previous `has_models` heuristic only probed one level, so the publisher dir (whose immediate children are model dirs, not weight files) was always marked as non-model-bearing. Pulled the direct- signal logic into `_has_direct_model_signal` and added a grandchild probe so the classic layout is now recognised. Still O(PROBE^2) worst-case, still returns immediately for `models--` names (HF cache layout) and for any direct weight file. ## 3. model_files_here hint on response body A leaf model dir (just GGUFs, no subdirs) previously rendered as `(empty directory)` in the modal, confusing users into thinking the folder wasn't scannable. Added a `model_files_here` count on the response (capped at 200) and a small hint row in the modal: `N model files in this folder. Click "Use this folder" to scan it.` ## Verification Simulated an LM Studio install by downloading the real 84 MB `unsloth/SmolLM2-135M-Instruct-Q2_K.gguf` into `~/.lmstudio/models/unsloth/SmolLM2-135M-Instruct-GGUF/`. Confirmed end-to-end: Home listing suggests `~/.lmstudio/models` as a chip. * Browsing `~/.lmstudio/models` flags `unsloth` (publisher) as `has_models=true` via the 2-level probe. * Browsing the publisher flags `SmolLM2-135M-Instruct-GGUF` (model dir) as `has_models=true`. * Browsing the model dir returns empty entries but `model_files_here=1`, and the frontend renders a hint telling the user it is a valid target. * Studio: one-click scan-folder add + prominent remove + plain search icon Three small Custom Folders UX fixes after real-use walkthrough: * One-click add from the folder browser. Confirming `Use this folder` now submits the path directly to `POST /api/models/scan-folders` instead of just populating the text input. `handleAddFolder` takes an optional explicit path so the submit lands in the same tick as `setFolderInput`, avoiding a state-flush race. The typed-path + `Add` button flow is unchanged. * Prominent remove X on scan folders. The per-folder delete button was `text-muted-foreground/40` and hidden entirely on desktop until hovered (`md:opacity-0 md:group-hover:opacity-100`). Dropped the hover-only cloak, bumped color to `text-foreground/70`, added a red hover/focus background, and sized the icon up from `size-2.5` to `size-3`. Always visible on every viewport. * Plain search icon for the Browse button. `FolderSearchIcon` replaced with `Search01Icon` so it reads as a simple "find a folder" action alongside the existing `Add01Icon`. * Studio: align Custom Folders + and X buttons on the same right edge The Custom Folders header used `px-2.5` with a `p-0.5` icon button, while each folder row used `px-3` with a `p-1` button. That put the X icon 4px further from the right edge than the +. Normalised both rows to `px-2.5` with `p-1` so the two icons share a column. * Studio: empty-state button opens the folder browser directly The first-run empty state for Custom Folders was a text link reading "+ Add a folder to scan for local models" whose click toggled the text input. That's the wrong default: a user hitting the empty state usually doesn't know what absolute path to type, which is exactly what the folder browser is for. * Reword to "Browse for a models folder" with a search-icon affordance so the label matches what the click does. * Click opens the folder browser modal directly. The typed-path + Add button flow is still available via the + icon in the section header, so users who know their path keep that option. * Slightly bump the muted foreground opacity (70 -> hover:foreground) so the button reads as a primary empty-state action rather than a throwaway hint. * Studio: Custom Folders header gets a dedicated search + add button pair The Custom Folders section header had a single toggle button that flipped between + and X. That put the folder-browser entry point behind the separate empty-state link. Cleaner layout: two buttons in the header, search first, then add. * Search icon (left) opens the folder browser modal directly. * Plus icon (right) toggles the text-path input (unchanged). * The first-run empty-state link is removed -- the two header icons cover both flows on every state. Both buttons share the same padding / icon size so they line up with each other and with the per-folder remove X. * Studio: sandbox folder browser + bound caps + UX recoveries PR review fixes for the Custom Folders folder browser. Closes the high-severity CodeQL path-traversal alert and addresses the codex / gemini P2 findings. Backend (studio/backend/routes/models.py): * New _build_browse_allowlist + _is_path_inside_allowlist sandbox. browse_folders now refuses any target that doesn't resolve under HOME, HF cache, Studio dirs, registered scan folders, or the well-known third-party model dirs. realpath() is used so symlink traversal cannot escape the sandbox. Also gates the parent crumb so the up-row hides instead of 403'ing. * _BROWSE_ENTRY_CAP now bounds visited iterdir entries, not appended entries. Dirs full of files (or hidden subdirs when show_hidden is False) used to defeat the cap. * _count_model_files gets the same visited-count fix. * PermissionError no longer swallowed silently inside the enumeration / counter loops -- now logged at debug. Frontend (folder-browser.tsx, pickers.tsx, chat-api.ts): * splitBreadcrumb stops mangling literal backslashes inside POSIX filenames; only Windows-style absolute paths trigger separator normalization. The Windows drive crumb value is now C:/ (drive root) instead of C: (drive-relative CWD-on-C). * browseFolders accepts and forwards an AbortSignal so cancelled navigations actually cancel the in-flight backend enumeration. * On initial-path fetch error, FolderBrowser now falls back to HOME instead of leaving the modal as an empty dead end. * When the auto-add path (one-click "Use this folder") fails, the failure now surfaces via toast in addition to the inline paragraph (which is hidden when the typed-input panel is closed). * Studio: rebuild browse target from trusted root for CodeQL clean dataflow CodeQL's py/path-injection rule kept flagging the post-validation filesystem operations because the sandbox check lived inside a helper function (_is_path_inside_allowlist) and CodeQL only does intra-procedural taint tracking by default. The user-derived ``target`` was still flowing into ``target.exists`` / ``target.is_dir`` / ``target.iterdir``. The fix: after resolving the user-supplied ``candidate_path``, locate the matching trusted root from the allowlist and rebuild ``target`` by appending each individually-validated segment to that trusted root. Each segment is rejected if it isn't a single safe path component (no separators, no ``..``, no empty/dot). The downstream filesystem ops now operate on a Path constructed entirely from ``allowed_roots`` (trusted) plus those validated segments, so CodeQL's dataflow no longer sees a tainted source. Behavior is unchanged for all valid inputs -- only the construction of ``target`` is restructured. Live + unit tests all pass (58 selected, 7 deselected for Playwright env). * Studio: walk browse paths from trusted roots for CodeQL --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@h100-8-cheapest.us-east5-a.c.unsloth.internal>	2026-04-15 08:04:33 -07:00
Roland Tannous	800ddc95f8	Re-apply #4939 : updated models template mappers (#4950 ) * Reapply "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945) This reverts commit `33503ea248`. * Add missing gemma-4-31B-it bnb-4bit mapper entry and LFM2.5 upstream namespace for PR #4950 - Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to __INT_TO_FLOAT_MAPPER so the int-to-float resolution works for this model (already listed in TEMPLATE_TO_MODEL_MAPPER but had no mapper entry). - Add LiquidAI/LFM2.5-1.2B-Instruct to lfm-2.5 TEMPLATE_TO_MODEL_MAPPER entry so the canonical upstream namespace is mapped consistently with lfm-2. * Add missing gemma-4-31B-it bnb-4bit Ollama mapping and lfm-2.5 chat template alias - Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to OLLAMA_TEMPLATE_TO_MODEL_MAPPER so Ollama export works for this model (E2B-it and E4B-it bnb-4bit variants were already present, 31B-it was inconsistently omitted) - Register CHAT_TEMPLATES["lfm-2.5"] as alias of the lfm-2 template to prevent KeyError when Studio resolves LFM2.5 models through MODEL_TO_TEMPLATE_MAPPER * Add missing LFM2 bnb-4bit INT_TO_FLOAT_MAPPER entry unsloth/LFM2-1.2B-unsloth-bnb-4bit is referenced in model_mappings.py but had no mapper.py entry, so model resolution would fail when users load that variant with load_in_4bit=False or when the float name is used with load_in_4bit=True. * Fix review findings for PR #16 1. ollama_template_mappers.py: Restore dropped Gemma-4 base model IDs (E2B, E4B, 31B, 26B-A4B) and add missing google/ upstream IDs to the gemma4 Ollama mapper for consistency with other gemma entries. 2. mapper.py: Remove self-mapping non-bnb-4bit entries from __INT_TO_FLOAT_MAPPER that were polluting FLOAT_TO_INT_MAPPER with lowercase 16-bit names, causing load_in_4bit=True to return bad model names. Add direct MAP_TO_UNSLOTH_16bit entries to preserve the google->unsloth 16-bit redirects. 3. mapper.py: Add LFM2.5 MAP_TO_UNSLOTH_16bit redirect so LiquidAI/LFM2.5-1.2B-Instruct resolves to its unsloth mirror. * Add review tests for PR #4950 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove top-level test files These test_.py files were added at the repo root rather than under tests/. Removing them from this PR; the production mapper changes remain. Add gemma-4-26B-A4B-it mapping Adds unsloth/gemma-4-26B-A4B-it to __INT_TO_FLOAT_MAPPER as a 2-tuple so google/gemma-4-26B-A4B-it routes to unsloth/gemma-4-26B-A4B-it across INT_TO_FLOAT_MAPPER, FLOAT_TO_INT_MAPPER, and MAP_TO_UNSLOTH_16bit. The 26B-A4B (MoE) model has no bnb-4bit variant, so the key uses the plain unsloth name rather than the -unsloth-bnb-4bit suffix. Removes the now-redundant standalone _add_with_lower call for the -it variant; the 16bit mapping is registered via the dict loop. * Add unsloth-bnb-4bit mappings for gemma-4 base (non-it) models Adds E2B, E4B, 31B base unsloth-bnb-4bit entries to __INT_TO_FLOAT_MAPPER. The 26B-A4B (MoE) base has no bnb-4bit variant on HF, so it stays on the standalone _add_with_lower line for the 16bit-only routing. Removes the redundant _add_with_lower lines for E2B, E4B, 31B base since the dict loop now registers the same google->unsloth route through the 2-tuple entries, plus full FLOAT_TO_INT and INT_TO_FLOAT coverage. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 07:52:12 -07:00
Daniel Han	c3cd890357	Studio: refresh Downloaded GGUF list and recurse into variant subdirs (#5032 ) * Studio: refresh Downloaded GGUF list and recurse into variant subdirs Two fixes for the model picker's "Downloaded" section. Frontend (`pickers.tsx`): * `HubModelPicker`'s mount effect short-circuited the cached-gguf and cached-models refetch whenever the module-level cache already had entries (`if (alreadyCached) return;`). After downloading a new repo in the same session, reopening the picker rendered the stale cache and the new repo never appeared in "Downloaded" until a full page reload. The early return is removed so the lists are always refreshed on mount; the module cache still drives the initial render so there is no spinner flash when we already had data. Backend (`utils/models/model_config.py`): * `list_local_gguf_variants` and `_find_local_gguf_by_variant` used a non-recursive `Path.glob(".gguf")`. Some HF GGUF repos (e.g. `unsloth/gemma-4-26B-A4B-it-GGUF`) place the largest quants under a variant-named subdirectory such as `BF16/...gguf`, which the top-level glob missed. Both helpers now use `rglob` and the variant filename is stored as a path relative to the scan root so the locator can still find the file. The flat-layout case (variants directly in the snapshot root) is unchanged: verified against `unsloth/gemma-4-E2B-it-GGUF` which still returns its UD-Q4_K_XL variant correctly. Studio: emit posix-style relative filenames for local GGUF subdirs `list_local_gguf_variants` was doing `str(f.relative_to(p))`, which on Windows produces backslash-separated paths like `BF16\foo.gguf`. The remote `list_gguf_variants` (HF API path) always returns forward-slash filenames such as `BF16/foo.gguf`, so the two would diverge on Windows. Switch to `.as_posix()` so the local and remote variant filenames stay identical across Linux, macOS, and Windows. Verified by simulating with `PureWindowsPath` in the test suite. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: detect mmproj at snapshot root for nested-variant layouts When _find_local_gguf_by_variant returns a weight file inside a quant-named subdir (e.g. snapshot/BF16/foo.gguf), detect_mmproj_file was scanning only the immediate parent and missing the mmproj file sitting at the snapshot root. The model was then loaded without --mmproj, silently breaking vision support for repos that ship nested variants. detect_mmproj_file now takes an optional search_root and walks up from the weight file to that root, in order, so the mmproj at the snapshot root is picked up. Sibling quant subdirs are not scanned, so an unrelated variant's mmproj does not leak in. Also apply the suggested micro-optimization on relative_to in list_local_gguf_variants -- only build the posix path when storing the first file for a quant. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 07:34:42 -07:00
Daniel Han	1ccfd2e0a5	fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033 ) * fix(rocm): tighten gfx regex to ignore generic ISA lines ROCm 6.1+ rocminfo emits generic ISA names such as "amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic" alongside the real GPU name. The previous `gfx[1-9]` regex used in `_has_rocm_gpu` matched both, so a host with only a generic ISA entry would be reported as having a usable AMD GPU. Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids match. This covers every documented target from GFX6 (gfx600) through GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 / MI250X) and gfx90c. Documented generic ISA names always have 1 or 2 digits before the dash and no longer match. Applied to both `studio/install_python_stack.py` and `studio/install_llama_prebuilt.py` so the two detection paths agree. Co-authored-by: Martin Hoyer <mhoyer@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Martin Hoyer <mhoyer@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:24:41 -07:00
Lee Jackson	f9ef639dde	Studio: support GGUF variant selection for non-suffixed repos (#5023 ) * fix: support GGUF variant selection for non-suffixed repos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: harden GGUF detection across cached models and picker flows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: use shared GGUF picker helper for search rows * fix: avoid mixed cache duplication and preserve GGUF fallback detection * fix: unify GGUF cache matching and merge picker hints * fix: normalize local GGUF matching across picker and model config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: robust cached-gguf classification + hint-aware click routing - _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback by commit_hash so partial/interrupted downloads don't TypeError out of sum() and wipe the entire cached list. - list_cached_gguf / list_cached_models: narrow per-repo try/except so one malformed repo no longer poisons the whole response. - handleModelClick: route through isKnownGgufRepo instead of the suffix-only isGgufRepo, so non-suffixed GGUF repos still open the variant expander from every call site. - Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known GGUF ids to stop conflating "no hint" with "known not-GGUF". - Make HfModelResult.isGguf required (it is always set in makeMapModel). - Add regression tests for the None size case, mixed-repo inclusion in cached-gguf, and per-repo error isolation. * fix: exclude mmproj from GGUF classification and case-normalize hint lookups - _repo_gguf_size_bytes now filters mmproj vision-adapter files so safetensors+mmproj.gguf repos stay on the cached-models path and non-GGUF rows no longer show zero pickable variants. A vision-capable GGUF repo (main weight + mmproj adapter) still classifies as GGUF and reports the main weight size. - modelGgufIds / resultGgufIds now key on lowercased ids and isKnownGgufRepo lowercases its lookup, so store and HF-search ids that differ only by casing still match the same GGUF hint. - New regression tests: mmproj-only repo excluded from cached-gguf, same repo included in cached-models, vision-capable repo still classified as GGUF with correct size. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-15 15:32:01 +04:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Daniel Han	5aa8c15246	Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021 ) * Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast llama-server's default behavior when the KV cache fills is to silently drop the oldest non-``n_keep`` tokens and keep generating. The UI has no way to tell the user that earlier turns were evicted -- they just see degraded continuity and a confusing ``5,361 / 4,096`` on the context usage bar. Launch llama-server with ``--no-context-shift`` so it returns a clean error once the request would exceed ``n_ctx``. In the chat adapter, catch the error, identify it as a context-limit error via ``isContextLimitError()``, and surface a dedicated toast that names the exact control to adjust: the ``Context Length`` field in the chat Settings panel. Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage crosses 85%, so users see the "raise Context Length in Settings" suggestion before they hit the hard stop. Tests: * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift`` flag in the static launch-command template, and pins it inside the unconditional ``cmd = [ ... ]`` block so a future refactor can't hide it behind a branch. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Shorten --no-context-shift comment to 1 line * Match backend _friendly_error rewrite in isContextLimitError Codex review on PR caught that ``backend/routes/inference.py::_friendly_error`` rewrites the raw llama-server text "request (X tokens) exceeds the available context size (Y tokens)" into "Message too long: X tokens exceeds the Y-token context window. ..." on the main streaming GGUF path. The heuristic only looked for "context size" / "exceeds the available context" / "context shift", none of which survive the rewrite, so the new "Context limit reached" toast would never fire for the most common case. Add matches for "message too long" and "context window" so both wordings hit. Also addresses Gemini feedback on the launch-flag test: * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of reading ``__file__`` directly; scopes the assertions to the function that actually launches llama-server. * Replace the hardcoded ``" ]"`` indent search with a line-at-a-time scan for a line that is just ``]``, so the test survives reformatting. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:20 -07:00
Daniel Han	5861a7ce15	Studio: split model-load progress label across two rows (#5020 ) * Studio: split model-load progress label across two rows The chat flow and training overlay both compose a progress label like "112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the percent badge in a single flex row. Once the rate + ETA part shows up, the label outgrows the row width and wraps mid-phrase, orphaning the percent ("19 left %") onto a second ragged line. Fix in model-load-status.tsx: split the label on the first " • " into a primary (size) chunk that stays on row 1 with the percent, and a secondary (rate/ETA) chunk that renders on its own muted row below. Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly to one row. The inline-status variant keeps only the primary and surfaces the full label via the tooltip. Also extracts the rate/ETA math out of useTransferStats into a pure ``transfer-stats.ts`` module (appendSample + computeTransferStats) so it can be reasoned about and tested without React. The hook is now a thin wrapper that feeds sample history through the pure functions. Backend: adds two companion test files for load_progress(): * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform matrix (Linux /proc, macOS/Windows absence), VmRSS parsing variants (tab/space/missing/malformed), filesystem edges (HF-cache symlinks, broken symlinks, nonexistent paths, relative paths), shard aggregation (partial multi-shard, two series in same dir, mmproj-* exclusion, single-file), lifecycle races, concurrent sampling (10 threads x 50 iters against real /proc), fraction bounds. * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live integration: real subprocess allocating 100 MB to match VmRSS, real ready phase, real dead-pid degradation, real shard aggregation, repeated polling. Skipped on non-Linux. Both complement the existing test_llama_cpp_load_progress.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Hoist splitProgressLabel out of JSX IIFE (review feedback) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:16 -07:00
Daniel Han	bb14ab144a	Studio: live model-load progress + rate/ETA on download and load (#5017 ) * Studio: live model-load progress + rate/ETA on download and load Two UX fixes for the opaque multi-minute wait between clicking Load and being able to chat, visible most clearly on large MoE GGUFs like MiniMax-M2.7 (131 GB of weights on a 97 GB GPU): 1. Model-load phase is now observable. The existing chat flow transitions the toast to "Starting model..." as soon as the download hits 100%, then shows a spinner with no other feedback until llama-server reports healthy. For a 130 GB model that spinner freezes for five-plus minutes while the kernel pages shards into the page cache. A new `GET /api/inference/load-progress` endpoint samples `/proc/<pid>/status VmRSS` on the llama-server subprocess against the sum of shard file sizes on disk, so the UI can render a real bar plus rate / ETA during that window. 2. Rate and ETA on downloads and loads. Both the chat toast and the training-start overlay used to show a static pair of numbers (for example "15.4 of 140.8 GB"). A rolling 15-second window over the existing byte-series now surfaces "85.3 MB/s, 24m 23s left" beside that pair. The estimator is shared between the download and load phases so the numbers don't reset when the phase flips. Also fixes a pre-existing assignment bug uncovered while wiring this up: `load_model` was storing the caller's `gguf_path` kwarg into `self._gguf_path`, which is `None` on the HF-download code path. The resolved on-disk path (`model_path`) is what llama-server actually mmaps; downstream consumers need that. No existing reader used `_gguf_path`, so this is a correctness fix for the new endpoint. - Backend: `LlamaCppBackend.load_progress()`, `GET /api/inference/load-progress`, `LoadProgressResponse` Pydantic model. - Frontend: `useTransferStats` hook, `formatRate` / `formatEta` helpers, `getLoadProgress` client, rewired chat toast and `DownloadRow` in the training overlay. - Tests: `studio/backend/tests/test_llama_cpp_load_progress.py` covers empty states, mmap phase, ready phase, sharded total aggregation, missing gguf_path, and unreadable /proc (7 cases). `tsc -b` and `vite build` on the frontend both clean. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:46:22 -07:00
Roland Tannous	514bb3a20e	studio: pin peft to 0.18.1 to fix export subprocess issues (#5015 ) * studio: pin peft to 0.18.1 to fix export subprocess issues peft 0.19.0 causes export subprocess shutdown failures in Studio. Reverting to 0.18.1 resolves the issue. * studio: move peft pin to extras-no-deps to prevent torch upgrade Installing peft via overrides.txt would resolve its deps and pull in torch>=0.11.0, breaking other pinned packages. Moving the pin to extras-no-deps.txt ensures --no-deps is used during install.	2026-04-14 20:16:30 +04:00
Daniel Han	7252410ccc	studio: stream export worker output into the export dialog (#4897 ) * studio: stream export worker output into the export dialog The Export Model dialog only showed a spinner on the "Exporting..." button while the worker subprocess was doing the actual heavy lifting. For Merged to 16bit and GGUF / Llama.cpp exports this meant several minutes (or more, for large models) of opaque silence, with no way to tell whether save_pretrained_merged, convert_hf_to_gguf.py, or llama-quantize was making progress. This adds a live terminal-style output panel inside the export dialog, rendered just above the Cancel / Start Export buttons and scrollable with auto-follow-tail. It shows stdout and stderr from both the worker process itself and any child process it spawns (GGUF converter, llama-quantize), coloured by stream. Backend - core/export/worker.py: new _setup_log_capture(resp_queue) installed before LogConfig.setup_logging. It saves the original stdout/stderr fds, creates pipes, os.dup2's the write ends onto fds 1 and 2 (so every child process inherits the redirected fds), and spins up two daemon reader threads. Each thread reads bytes from a pipe, echoes them back to the original fd (so the server console keeps working), splits on \n and \r, and forwards each line to the resp queue as {"type":"log","stream":"stdout\|stderr","line":...,"ts":...}. PYTHONUNBUFFERED=1 is set so nested Python converters flush immediately. - core/export/orchestrator.py: - Thread-safe ring buffer (collections.deque, maxlen 4000) with a monotonically increasing seq counter. clear_logs(), get_logs_since(cursor), get_current_log_seq(), is_export_active(). - _wait_response handles rtype == "log" by appending to the buffer and continuing the wait loop. Status messages are also surfaced as a "status" stream so users see high level progress alongside raw subprocess output. - load_checkpoint, _run_export, and cleanup_memory now wrap their bodies with the existing self._lock (previously unused), clear the log buffer at the start of each op, and flip _export_active in a try/finally so the SSE endpoint can detect idle. - routes/export.py: - Wrapped every sync orchestrator call (load_checkpoint, cleanup_memory, export_merged_model, export_base_model, export_gguf, export_lora_adapter) in asyncio.to_thread so the FastAPI event loop stays free during long exports. Without this the new SSE endpoint could not be served concurrently with the blocking export POST. - New GET /api/export/logs/stream SSE endpoint. Honors Last-Event-ID and a since query param for reconnect, emits log / heartbeat / complete / error events, uses the id field to carry the log seq so clients can resume cleanly. On first connect without an explicit cursor it starts from the current seq so old lines from a previous run are not replayed. Frontend - features/export/api/export-api.ts: streamExportLogs() helper that authFetches the SSE endpoint and parses id / event / data fields manually (same pattern as streamTrainingProgress in train-api.ts). - features/export/components/export-dialog.tsx: - Local useExportLogs(exporting) hook that opens the SSE stream on exporting transitions to true, accumulates up to 4000 lines in component state, and aborts on cleanup. - New scrollable output panel rendered above DialogFooter, only shown for Merged to 16bit and GGUF / Llama.cpp (LoRA adapter is a fast disk write with nothing to show). Dark terminal styling (bg-black/85, emerald text, rose for stderr, sky for status), max-height 14rem, auto-scrolls to the bottom on new output but stops following if the user scrolls up. A small streaming / idle indicator is shown next to the panel title. - DialogContent widens from sm:max-w-lg to sm:max-w-2xl when the output panel is visible so the logs have room to breathe. Verified - Python smoke test (tests/smoke_export_log_capture.py): spawns a real mp.get_context("spawn") process, installs _setup_log_capture, confirms that parent stdout prints, parent stderr prints, AND a child subprocess invoked via subprocess.run (both its stdout and stderr) are all captured in the resp queue. Passes. - Orchestrator log helpers tested in isolation: _append_log, get_logs_since (with and without a cursor), clear_logs not resetting seq so reconnecting clients still progress. Passes. - routes.export imports cleanly in the studio venv and /logs/stream shows up in router.routes. - bun run build: tsc -b plus vite build, no TypeScript errors. No existing export behavior is changed. If the subprocess, the SSE endpoint, or the frontend hook fails, the export itself still runs to completion the same way it did before, with or without logs visible. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * export dialog: trim bootstrap noise, scope logs per screen, show realpath Several follow-ups to the live export log work: 1. Worker bootstrap noise (transformers venv activation, Unsloth banner, "Top GGUF/hub models" lists, vision detection, 2k-step weight load bar) is dropped from the export-dialog stream. A threading.Event gate in worker.py defaults closed and only opens once _handle_export actually starts; until then the reader thread still echoes lines to the saved console fd for debugging but does not push them onto the resp_queue. The orchestrator already spawns a fresh subprocess for every checkpoint load, so the gate is naturally reset between runs. 2. tqdm in non-tty mode defaults to a 10s mininterval, which makes multi-step bars look frozen in the panel. Set TQDM_MININTERVAL=0.5 in the worker env so any tqdm-driven progress emits more often. 3. The dialog's useExportLogs hook now also clears its line buffer when exportMethod or open changes, so re-opening the dialog into a different action's screen no longer shows the previous action's saved output. A useElapsedSeconds tick + "Working Xs" badge in the log header gives users a visible sign that long single-step phases (cache copies, GGUF conversion) are still running when no new lines are arriving. 4. ExportBackend.export_{merged,base,gguf,lora} now return (success, message, output_path); the worker forwards output_path on each export__done response, the orchestrator's _run_export passes it to routes/export.py, which surfaces it via ExportOperationResponse.details.output_path. The dialog's Export Complete screen renders the resolved on-disk realpath under "Saved to" so users can find their exported model directly. fix(cli): unpack 3-tuple return from export backend ExportOrchestrator.export_{merged,base,gguf,lora} now return (success, message, output_path) so the studio dialog can show the on-disk realpath. The CLI still unpacked 2 values, so every `unsloth export --format ...` crashed with ValueError before reporting completion. Update the four call sites and surface output_path via a "Saved to:" echo. * fix(studio): anchor export log SSE cursor at run start The export dialog SSE defaulted its cursor to get_current_log_seq() at connect time, so any line emitted between the POST that kicks off the export and the client opening the stream was buffered with seqs 1..k and then skipped (seq <= cursor). Long-running exports looked silent during their first seconds. Snapshot _log_seq into _run_start_seq inside clear_logs() and expose it via get_run_start_seq(). The SSE default cursor now uses that snapshot, so every line emitted since the current run began is reachable regardless of when the client connects. Old runs still can't leak in because their seqs are <= the snapshot. * fix(studio): reconnect export log SSE on stream drop useExportLogs launched streamExportLogs once per exporting transition and recorded any drop in .catch(). Long GGUF exports behind a proxy with an idle kill-timeout would silently lose the stream for the rest of the run even though the backend already supports Last-Event-ID resume. The "retry: 3000" directive emitted by the backend is only meaningful to native EventSource; this hook uses a manual fetch + ReadableStream parse so it had no effect. Wrap streamExportLogs in a retry loop that tracks lastSeq from ExportLogEvent.id and passes it as since on reconnect. Backoff is exponential with jitter, capped at 5s, reset on successful open. The loop stops on explicit backend `complete` event or on effect cleanup. * fix(studio): register a second command so Typer keeps `export` as a subcommand The CLI export unpacking tests wrap `unsloth_cli.commands.export.export` in a fresh Typer app with a single registered command. Typer flattens a single-command app into that command, so the test's `runner.invoke(cli_app, ["export", ckpt, out, ...])` treats the leading `"export"` token as an unexpected extra positional argument -- every parametrized case failed with: Got unexpected extra argument (.../out) Register a harmless `noop` second command so Typer preserves subcommand routing and the tests actually exercise the 3-tuple unpack path they were written to guard. Before: 4 failed After: 4 passed --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: studio-install <studio@local.install> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 08:55:43 -07:00
Daniel Han	eca592effe	studio: show HF model download progress in training start overlay (#4894 ) * studio: show HF model download progress in training start overlay During the training setup phase, the overlay only displayed a static "Loading model..." line while model weights were being downloaded from Hugging Face. On slow connections this looked like the app had frozen. This adds a small self-contained progress block inside the existing TrainingStartOverlay that polls the existing GET /api/models/download-progress endpoint and renders a Progress bar with bytes downloaded, total bytes, and percent complete. Notes: - Frontend only change. No backend, worker, SSE, or runtime store edits. - Reuses the existing getDownloadProgress client wrapper and the existing /api/models/download-progress endpoint that already scans the HF blob cache for completed and .incomplete files. - selectedModel is read directly from useTrainingConfigStore inside the overlay, so no prop drilling and live-training-view.tsx is unchanged. - Polling runs at 1500 ms and is gated on the HF repo regex (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses, so local paths and empty form state never hit the endpoint. - Polling stops once progress reaches 1.0 so the bar can stay at 100 until the overlay hides on the first training step. - Network errors are silently swallowed, matching the chat side flow (the bar simply freezes at the last value). - When downloadedBytes is 0 the block is hidden entirely, so cached models do not flash a progress bar. - When the HF API cannot determine the total size, the block falls back to "X downloaded" with no percent and no bar. Verified with bun run build (tsc -b plus vite build, no TypeScript errors). * training overlay: track dataset download + show on-disk realpath Adds a dedicated "Downloading dataset..." section to the training-start overlay alongside the existing model-weights one, so an HF dataset that is downloading mid-startup is no longer mislabeled as model weights or hidden entirely. The new GET /api/datasets/download-progress endpoint mirrors /api/models/download-progress against the datasets-- prefix in HF_HUB_CACHE. Both endpoints now also return cache_path, the resolved on-disk realpath of the snapshot directory (or the cache repo root if no snapshot is materialized yet). The overlay surfaces this under each download row so users can immediately see where the model and dataset landed without digging through server logs. The frontend's existing useModelDownloadProgress hook is generalized to a single useHfDownloadProgress(repoId, fetcher) hook that the model and dataset variants both delegate to, keeping polling, gating, and completion semantics in one place. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: Polish training start overlay download progress UI (#4957) * studio: polish training start overlay download progress visuals * Fix formatCachePath cross-platform support and redundant sizeLabel - Extend formatCachePath regex to also shorten macOS /Users/<user> paths to ~ - Suppress sizeLabel when no byte info is available (cachePath-only state), since the "Preparing" badge already conveys the status * Fix misleading status badge when download total is unknown - Hide badge when totalBytes is 0 but downloadedBytes > 0, since we cannot determine if the download is still in progress or already complete (happens when HF size metadata lookup fails for gated/private repos) - Keep "Preparing" badge for the zero-bytes cachePath-only state - Add Windows native path shortening to formatCachePath (C:\Users\<name>) --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> --------- Co-authored-by: studio-install <studio@local.install> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-14 08:54:01 -07:00
Daniel Han	44082cf88e	Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM (#5014 ) * Studio: anchor ctx-slider warning threshold at 4096 when weights exceed VRAM The chat settings sheet's ctx slider reads `max_context_length` from `/api/inference/status` and renders Exceeds estimated VRAM capacity (N tokens). The model may use system RAM. when the user drags the slider above that value. For models whose weights fit on some GPU subset, `_max_context_length` was already set to the binary-search cap and the warning fired correctly. For models whose weights exceed 90% of every GPU subset's free memory (e.g. MiniMax-M2.7-GGUF at 131 GB on a 97 GB GPU), the ceiling-probe loop never matched a subset, so `max_available_ctx` stayed at the native context (e.g. 196608). The slider ran all the way to native with no indication that any value above the 4096 spec default would trigger `--fit on` and degrade performance. Anchor `max_available_ctx` at `min(4096, native_context_length)` when no subset fits, so the warning fires at the right threshold and the user sees the correct safe-zone / warning-zone split: Before (MiniMax-M2.7 on 97 GB GPU): slider 0 .. 196608, warning threshold = 196608 (never fires) After: slider 0 .. 196608, warning threshold = 4096 (fires correctly) No frontend changes required: `chat-settings-sheet.tsx` already consumes `ggufMaxContextLength` (= status.max_context_length) as the warning threshold and `ggufNativeContextLength` as the slider max. Adds tests/test_llama_cpp_max_context_threshold.py covering weights-exceed-VRAM (single / multi-GPU), a native-ctx below the 4096 fallback case (don't lie about supported ctx), fittable-model regressions (small / multi-GPU / tiny on huge GPU), and the `max_context_length` property's fallback semantics. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:49 -07:00
Daniel Han	b2f80f210e	Studio: make GGUF disk-space preflight cache-aware (#5012 ) * Studio: make GGUF disk-space preflight cache-aware The pre-download disk check in LlamaCppBackend.load_model compared the repo's total GGUF size against free disk without crediting bytes already present in the Hugging Face cache. Re-loading a large cached model (e.g. MiniMax-M2.7-GGUF at 131 GB) then failed cold with "Not enough disk space to download any variant" whenever free disk was below the full weight footprint, even though nothing actually needed to be downloaded. Subtract bytes already on disk via try_to_load_from_cache before comparing against free space. A partial blob (interrupted download) is not credited, so a second attempt still allocates room to finish the download. The log line now also surfaces how much is already cached. Adds tests/test_llama_cpp_cache_aware_disk_check.py covering the fully-cached, partial-cache-insufficient-disk, partial-cache-enough-disk, cold-cache, incomplete-blob, and zero-size-path-info cases. Sparse tempfiles keep the GB-scale scenarios cheap to simulate. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:37 -07:00
Daniel Han	767fa8cade	Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM (#5011 ) * Studio: honor explicit GGUF ctx and default to 4096 when weights exceed VRAM The load-time auto-fit in LlamaCppBackend.load_model had two issues for models whose weights do not fit on any GPU subset (the common case for large MoE GGUFs such as MiniMax-M2.7, Qwen3.5-397B-A17B, etc.): 1. Auto mode (max_seq_length=0) left effective_ctx at the model's native context when no subset passed the 90% fit check. The UI slider then landed on e.g. 196608 for MiniMax-M2.7, far above anything usable. Default the auto-pick to 4096 so the UI starts at a sane value; the slider ceiling stays at the native context so the user can still opt in to longer contexts and receive the "might be slower" warning. 2. Explicit ctx was silently shrunk when weights fit but the requested KV overflowed the 90% budget. The shrink loop emitted -c <capped> -ngl -1 without informing the caller, so a user who had opted into a longer context via the UI never actually got it. Drop the shrink loop on the explicit path and emit -c <user_ctx> --fit on instead, letting llama-server flex -ngl (CPU layer offload). Adds tests/test_llama_cpp_context_fit.py covering both paths, the file-size-only fallback when KV metadata is missing, non-regression on fittable auto-pick, and platform-agnostic input shape. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 08:53:25 -07:00
TF-MTGE	a31c82a640	fix(studio): remove 300s cap on load_checkpoint (inherits 3600s default) (#4922 ) * fix: increase wait response timeout to 900 sec instead of 300 sec. #4845 * Apply suggestion from @gemini-code-assist[bot] good catch Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>	2026-04-14 08:53:14 -07:00
Datta Nimmaturi	da78c6be71	[Studio] Install flash attn at setup time for linux (#4979 ) * [Studio] Install flash attn at setup time for linux * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * cleanup changes Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Test cases * wheel_utils: narrow url_exists exceptions and log at debug level --------- Signed-off-by: Datta Nimmaturi <venkatadattasainimmaturi@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 16:40:17 +04:00
Datta Nimmaturi	dccc0ebada	[Studio] Show non exported models in chat UI (#4892 ) * Show non exported models in chat UI * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Distinguish b/w LoRa and full fine tune saves. Cleanup --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-14 15:03:58 +04:00
Bharath Kumar Adinarayan	a50f61009b	fix(studio): default chart view to full training history (#5007 ) * fix(studio): default chart view to full training history instead of last 80 steps Fixes #5003 * chore: windowsize as null code comment --------- Co-authored-by: imagineer99 <samleejackson0@gmail.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>	2026-04-14 03:29:27 -07:00
Lee Jackson	bfa17330bd	Studio: Polish API key copy button and harden async clipboard fallback (#5006 ) * fix: polish clipboard style and fix async clipboard path * Use copyToClipboardAsync in CopyButton for Safari fallback CopyButton was calling navigator.clipboard.writeText directly, bypassing the execCommand fallback added in this same PR. Switch to copyToClipboardAsync which tries execCommand first (Safari user-gesture requirement) then falls back to the async clipboard API. * Fix copyToClipboard sync contract regression and improve async path - Restore copyToClipboard() to return only the execCommand result, preserving the boolean contract that 7 existing callers depend on to gate their "Copied!" UI state. The fire-and-forget async fallback was returning true before the promise resolved, causing false success. - Add document.body null guard to copyWithExecCommand for SSR safety. - Reorder copyToClipboardAsync to try the async Clipboard API first, avoiding unnecessary DOM/focus overhead in Radix focus-trapped dialogs where execCommand always fails anyway. * Restore queryCommandSupported guard and fix async catch path - Restore the queryCommandSupported("copy") guard in copyToClipboard() to match the original contract exactly: when execCommand is entirely unsupported, fall through to fire-and-forget async clipboard write. - Fix copyToClipboardAsync catch block: after navigator.clipboard.writeText rejects, the user-gesture frame is gone, so execCommand will also fail. Return false from catch instead of falling through. The execCommand fallback at the bottom only runs when the Clipboard API is absent (still in user-gesture frame). * Restore execCommand fallback in copyToClipboardAsync catch path The catch block was returning false after clipboard API rejection, based on the incorrect premise that the user-gesture frame is lost after an await. Per the HTML spec, transient user activation IS preserved through promise microtask chains. The real reason execCommand fails in the Radix dialog is the focus trap intercepting textarea.focus(), not gesture loss. For non-dialog callers, execCommand can still succeed after a clipboard rejection. Inside a Radix modal, execCommand returns false harmlessly (focus trap blocks it). * Harden textarea fallback for mobile and continue to async path on failure --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-14 14:22:14 +04:00
Wasim Yousef Said	97eafd999e	studio: fix api-keys access + refresh (#5005 ) * studio: fix api-keys access + refresh * studio: guard v1 in spa fallback	2026-04-13 23:48:51 +04:00
AdamPlatin123	d2fc582840	studio: skip training status/metrics polling when idle (#4988 ) * fix(studio): skip training status/metrics polling when idle Add an early return in the status and metrics setInterval callbacks when the runtime store reports phase === "idle" and hasHydrated is true. Previously these polls fired unconditionally every 3s/5s, generating unnecessary network traffic and console errors when no training was running. * fix(studio): reduce idle polling to 30s instead of stopping entirely Review feedback (PR #4988): completely stopping polling when idle risks permanent UI desync if hydration fails, and misses out-of-band state changes from other clients. Add a 30s background poll that only fires when idle to recover gracefully. * fix: harden idle status polling around hydration and runtime reset --------- Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: imagineer99 <samleejackson0@gmail.com>	2026-04-13 12:02:12 -07:00
Daniel Han	9a261aec5f	Studio: Expose openai and anthropic compatible external API end points (#4956 ) * Studio: add API key authentication for programmatic access External users want to hit the Studio API (chat completions with tool calling, training, export, etc.) without going through the browser login flow. This adds sk-unsloth- prefixed API keys that work as a drop-in replacement for JWTs in the Authorization: Bearer header. Backend: - New api_keys table in SQLite (storage.py) - create/list/revoke/validate functions with SHA-256 hashed storage - API key detection in _get_current_subject before the JWT path - POST/GET/DELETE /api/auth/api-keys endpoints on the auth router Frontend: - /api-keys page with create form, one-time key reveal, keys table - API Keys link in desktop and mobile navbar - Route registered with requireAuth guard Zero changes to any existing route handler -- every endpoint that uses Depends(get_current_subject) automatically works with API keys. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Use actual origin in API key usage examples The examples on /api-keys were hardcoded to localhost:8888 which is wrong for remote users. Use window.location.origin so the examples show the correct URL regardless of where the user is connecting from. * Add `unsloth studio run` CLI command for one-liner model serving Adds a `run` subcommand that starts Studio, loads a model, creates an API key, and prints a ready-to-use curl command -- similar to `ollama run` or `vllm serve`. Usage: unsloth studio run -m unsloth/Qwen3-1.7B-GGUF --gguf-variant UD-Q4_K_XL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add end-to-end tests for `unsloth studio run` and API key usage Tests the 4 usage examples from the API Keys page: 1. curl basic (non-streaming) chat completions 2. curl streaming (SSE) chat completions 3. OpenAI Python SDK streaming completions 4. curl with tools (web_search + python) Also tests --help output, invalid key rejection, and no-key rejection. All 7 tests pass against Qwen3-1.7B-GGUF. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add /v1/completions, /v1/embeddings, /v1/responses endpoints and --parallel support - llama_cpp.py: accept n_parallel param, pass to llama-server --parallel - run.py: plumb llama_parallel_slots through to app.state - inference.py: add /completions and /embeddings as transparent proxies to llama-server, add /responses as application-level endpoint that converts to ChatCompletionRequest; thread n_parallel through load_model - studio.py: set llama_parallel_slots=4 for `unsloth studio run` path * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Make /v1/responses endpoint match OpenAI Responses API format The existing /v1/responses shim returned Chat Completions format, which broke OpenAI SDK clients using openai.responses.create(). This commit replaces the endpoint with a proper implementation that: - Returns `output` array with `output_text` content parts instead of `choices` with `message` - Uses `input_tokens`/`output_tokens` instead of `prompt_tokens`/ `completion_tokens` in usage - Sets `object: "response"` and `id: "resp_..."` - Emits named SSE events for streaming (response.created, response.output_text.delta, response.completed, etc.) - Accepts all OpenAI Responses API fields (tools, store, metadata, previous_response_id) without erroring -- silently ignored - Maps `developer` role to `system` and `input_text`/`input_image` content parts to the internal Chat format Adds Pydantic schemas for request/response models and 23 unit tests covering schema validation, input normalisation, and response format. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: add Anthropic-compatible /v1/messages endpoint (#4981) * Add Anthropic-compatible /v1/messages endpoint with tool support Translate Anthropic Messages API format to/from internal OpenAI format and reuse the existing server-side agentic tool loop. Supports streaming SSE (message_start, content_block_delta, etc.) and non-streaming JSON. Includes offline unit tests and e2e tests in test_studio_run.py. * Add enable_tools, enabled_tools, session_id to /v1/messages endpoint Support the same shorthand as /v1/chat/completions: enable_tools=true with an optional enabled_tools list uses built-in server tools without requiring full Anthropic tool definitions. session_id is passed through for sandbox isolation. max_tokens is now optional. * Strip leaked tool-call XML from Anthropic endpoint content Apply _TOOL_XML_RE to content events in both streaming and non-streaming tool paths, matching the OpenAI endpoint behavior. * Emit custom tool_result SSE event in Anthropic stream Adds a non-standard tool_result event between the tool_use block close and the next text block, so clients can see server-side tool execution results. Anthropic SDKs ignore unknown event types. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Split /v1/messages into server-side and client-side tool paths enable_tools=true runs the existing server-side agentic loop with built-in tools (web_search/python/terminal). A bare tools=[...] field now triggers a client-side pass-through: client-provided tools are forwarded to llama-server and any tool_use output is returned to the caller with stop_reason=tool_use for client execution. This fixes Claude Code (and any Anthropic SDK client) which sends tools=[...] expecting client-side execution but was previously routed through execute_tool() and failing with 'Unknown tool'. Adds AnthropicPassthroughEmitter to convert llama-server OpenAI SSE chunks into Anthropic SSE events, plus unit tests covering text blocks, tool_use blocks, mixed, stop reasons, and usage. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix httpcore GeneratorExit in /v1/messages passthrough stream Explicitly aclose aiter_lines() before the surrounding async with blocks unwind, mirroring the prior fix in external_provider.py (`a41160d3`) and cc757b78's RuntimeError suppression. * Wire stop_sequences through /v1/messages; warn on tool_choice Plumb payload.stop_sequences to all three code paths (server-side tool loop, no-tool plain, client-side passthrough) so Anthropic SDK clients setting stop_sequences get the behavior they expect. The llama_cpp backend already accepted `stop` on both generate_chat_ completion and generate_chat_completion_with_tools; the Anthropic handler simply wasn't passing it. tool_choice remains declared on the request model for Anthropic SDK compatibility (the SDK often sets it by default) but is not yet honored. Log a structured warning on each request carrying a non- null tool_choice so the silent drop is visible to operators. * Wire min_p / repetition_penalty / presence_penalty through /v1/messages Align the Anthropic endpoint's sampling surface with /v1/chat/completions. Adds the three fields as x-unsloth extensions on AnthropicMessagesRequest and threads them through all three code paths: server-side tool loop, no-tool plain, and client-side passthrough. The passthrough builder emits "repeat_penalty" (not "repetition_penalty") because that is llama-server's field name; the backend methods already apply the same rename internally. * Fix block ordering and prev_text reset in non-streaming tool path _anthropic_tool_non_streaming was building the response by appending all tool_use blocks first, then a single concatenated text block at the end — losing generation order and merging pre-tool and post-tool text into one block. It also never reset prev_text between synthesis turns, so the first N characters of each post-tool turn were dropped (where N = length of the prior turn's final cumulative text). Rewrite to build content_blocks incrementally in generation order, matching the streaming emitter's behavior: deltas within a turn are merged into the trailing text block, tool_use blocks interrupt the text sequence, and prev_text is reset on tool_end so turn N+1 diffs against an empty baseline. Caught by gemini-code-assist[bot] review on #4981. * Make test_studio_run.py e2e tests pytest-compatible Add a hybrid session-scoped studio_server fixture in conftest.py that feeds base_url / api_key into the existing e2e test functions. Three invocation modes are now supported: 1. Script mode (unchanged) — python tests/test_studio_run.py 2. Pytest + external server — point at a running instance via UNSLOTH_E2E_BASE_URL / UNSLOTH_E2E_API_KEY env vars, no per-run GGUF load cost 3. Pytest + fixture-managed server — pytest drives _start_server / _kill_server itself via --unsloth-model / --unsloth-gguf-variant, CI-friendly The existing _start_server / _kill_server helpers and main() stay untouched so the script entry point keeps working exactly as before. Test function signatures are unchanged — the (base_url, api_key) parameters now resolve via the new fixtures when running under pytest. * Rename test_studio_run.py -> test_studio_api.py The file is entirely about HTTP API endpoint testing (OpenAI-compatible /v1/chat/completions, Anthropic-compatible /v1/messages, API key auth, plus a CLI --help sanity check on the command that runs the API). None of its tests cover training, export, chat-UI, or internal-Python-API concerns. The old name misleadingly suggested "tests for the unsloth studio run CLI subcommand" — the new name reflects the actual scope. Updates: - git mv the file (rename tracked, history preserved) - Rewrite opening docstring to state the API surface focus and call out what is explicitly out of scope - Update all 4 Usage-block path references to the new filename - LOG_FILE renamed to test_studio_api.log - conftest.py fixture import rewritten from test_studio_run to test_studio_api, plus 7 docstring/comment references updated No functional changes to test logic, signatures, or main(). --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix httpcore asyncgen cleanup in /v1/messages and /v1/completions The earlier fix in `985e92a9` was incomplete: it closed aiter_lines() explicitly but still used `async with httpx.AsyncClient()` / `async with client.stream()` inside the generator. When the generator is orphaned (e.g. client disconnects mid-stream and Starlette drops the StreamingResponse iterator without explicitly calling aclose()), Python's asyncgen finalizer runs the cleanup in a DIFFERENT task than the one that originally entered the httpx context managers. The `async with` exits then trigger httpcore's HTTP11ConnectionByteStream .aclose(), which enters anyio.CancelScope.__exit__ with a mismatched task and raises RuntimeError("Attempted to exit cancel scope in a different task"). That error escapes any user-owned try/except because it happens during GC finalization. Replace `async with` with manual client/response lifecycle in both /v1/messages passthrough and /v1/completions proxy. Close the response and client in a finally block wrapped in `try: ... except Exception: pass`. This suppresses RuntimeError (and other Exception subclasses) from the anyio cleanup noise while letting GeneratorExit (a BaseException, not Exception) propagate cleanly so the generator terminates as Python expects. Traceback observed in user report: File ".../httpcore/_async/connection_pool.py", line 404, in __aiter__ yield part RuntimeError: async generator ignored GeneratorExit ... File ".../anyio/_backends/_asyncio.py", line 455, in __exit__ raise RuntimeError( RuntimeError: Attempted to exit cancel scope in a different task * Expand unsloth studio run banner with SDK base URL and more curl examples Add an explicit "OpenAI / Anthropic SDK base URL" line inside the info box so SDK users don't accidentally copy the bare server URL (without /v1) into their OpenAI/Anthropic SDK constructors and hit 404s. Replace the single /v1/chat/completions curl example with three labeled blocks: chat/completions, Anthropic /messages, and OpenAI Responses. The Anthropic example includes max_tokens (Anthropic SDKs require it even though Studio accepts None). All examples derived from a computed sdk_base_url so the /v1 prefix stays in sync if the public path ever changes. * Hash API keys with HMAC-SHA256 + persistent server secret Stores the HMAC secret in a new app_secrets singleton table. Fixes CodeQL py/weak-sensitive-data-hashing alert on storage.py:74-76, 394-395. Refresh tokens stay on plain SHA-256 (unchanged _hash_token) so existing user sessions survive upgrade — API keys are new on this branch so there is no migration. * Use PBKDF2 for API key hashing per CodeQL recommendation HMAC-SHA256 was still flagged by py/weak-sensitive-data-hashing. Switch to hashlib.pbkdf2_hmac, which is in CodeQL's recommended allowlist (Argon2/scrypt/bcrypt/PBKDF2). Persistent server-side salt stays in app_secrets for defense-in-depth. 100k iterations to match auth/hashing.py's password hasher. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai>	2026-04-13 21:08:11 +04:00
Roland Tannous	3bb72a557f	Pin kernels==0.12.1 to avoid huggingface_hub dataclass conflict (#5000 )	2026-04-13 20:42:02 +04:00
Lee Jackson	21a7895959	Studio: Prompt manager, message deletion, and chat UI improvements (#4938 ) * feat(chat): code block styling, delete with Dexie sync, settings sheet polish * style: config save/delete padding fix * fix(studio): centralize dark code-block surface and optimize message sync writes * style: config padding/alignment polish * fix(studio): upsert custom presets without implicit rename-delete * fix settings sheet save state polish * fix settings sheet button widths * fix chat settings presets * fix chat delete sync * fix chat trust remote code flow --------- Co-authored-by: shine1i <wasimysdev@gmail.com>	2026-04-13 16:42:33 +02:00
AdamPlatin123	3b092bcd46	fix(studio): prevent route transition DOM duplication via AnimatePresence (#4987 ) Add mode="wait" and exit={{ opacity: 0 }} to the root AnimatePresence wrapper so outgoing routes fully unmount before incoming routes render. Without this, rapid navigation between Studio/Export/Recipes/Chat caused pages to stack (2x–3x duplication). Co-authored-by: AdamPlatin123 <AdamPlatin123@users.noreply.github.com> Co-authored-by: Wasim Yousef Said <wasimysdev@gmail.com>	2026-04-13 01:38:00 -07:00
Daniel Han	65b4028560	Pin bitsandbytes to continuous-release_main on ROCm (4-bit decode fix) (#4954 ) * Pin bitsandbytes to continuous-release_main on ROCm for 4-bit decode fix bitsandbytes 0.49.2 on PyPI ships with a broken 4-bit GEMV kernel on every ROCm target: - CDNA (gfx90a / gfx942 / gfx950 = MI210 / MI300X / MI350) via a broken blocksize=32/64 warp64 GEMV kernel whose tests were explicitly skipped with ROCM_WARP_SIZE_64 guards because the code was known broken. - RDNA3 / RDNA3.5 (gfx1100-1103 / gfx1150-1152) via a compile-time BNB_WARP_SIZE macro in the host-side dispatch that resolves to 64 when the multi-arch wheel is compiled with CDNA as the primary target, so num_blocks is wrong on RDNA and half the GEMV output is never written. At decode shape (1, 1, hidden) both bugs produce NaN. Training is unaffected because training shapes are (batch, seq_len > 1, hidden) and never touch the GEMV path. The crash during autoregressive inference surfaces as _assert_async_cuda_kernel in torch.multinomial which on HIP becomes a hard HSA_STATUS_ERROR_EXCEPTION instead of a clean Python error. Both bugs are fixed by bitsandbytes commit 713a3b8 ("[ROCm] Enable blocksize 32 4-bit quantization and GEMV kernels on AMD CDNA", PR #1887, merged 2026-03-09) which replaces BNB_WARP_SIZE with a runtime hipDeviceGetAttribute query and ships a working CDNA warp64 kernel. That commit has not shipped to PyPI yet, but continuous-release_main wheels are published on every push to bnb main via GitHub Releases. Point the ROCm install path at the continuous-release_main x86_64 and aarch64 wheels and fall back to PyPI >=0.49.1 when the pre-release is unreachable (offline installs, firewalled hosts, or architectures not covered by the pre-release wheels). Drop the pin once bnb cuts a 0.50+ tag on PyPI. Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): direct bnb GEMV shape test now returns 0.0078 max abs error at seq_len=1 (no NaN) vs NaN on 0.49.2, and full Unsloth + for_inference + 4-bit sampling generation works end-to-end. NVIDIA / CPU / Mac / Windows paths are unaffected -- the helper is gated on the ROCm torch index and platform.machine() respectively. * Drop Studio ROCm 16-bit fallback now that bnb 0.50+ fixes 4-bit decode The 16-bit fallback in studio/backend/core/inference/inference.py was added as a workaround for a bug that this PR already fixes at the install layer: bitsandbytes <= 0.49.2 has a broken 4-bit GEMV kernel on every ROCm target, which NaNs at decode shape (seq_len=1) and crashes autoregressive inference. bnb PR #1887 (commit 713a3b8, in 0.50.0.dev0+, pinned by install.sh / install_python_stack.py in this PR) restores correct 4-bit decode on MI300X and verified working end-to-end with full Unsloth + for_inference + sampling. Revert the dual code path so ROCm and NVIDIA both go through the normal FastLanguageModel.from_pretrained + for_inference flow: - Remove the conditional `from unsloth import` that skipped the import on ROCm. The monkey-patches it was trying to avoid were never the cause of the crash; bnb 4-bit GEMV was. - Remove the `if _hw_module.IS_ROCM:` branch in load_model that loaded with plain transformers + PEFT + bfloat16, and the `_resolve_fp16_base` helper it relied on. - Remove the `get_chat_template is not None` fallback in _load_chat_template_info -- get_chat_template is now always imported. - Refactor the audio/vision ROCm guard to check _hw_module.IS_ROCM directly instead of the removed _IS_ROCM_ENV global. Audio and vision on ROCm still need separate validation (FastVisionModel and the CSM audio codecs were never tested on HIP) so the guard stays for now. Add _bnb_rocm_4bit_ok() as a runtime safety net for users who install from this PR before the install.sh bnb pin kicks in, or whose installer fell back to the PyPI pin because the continuous- release wheel was unreachable. When the installed bnb is < 0.50 on ROCm, force load_in_4bit=False and strip any -unsloth-bnb-4bit / -bnb-4bit suffix from the model path so a pre-quantized repo resolves to its FP16 sibling instead of pulling bnb back in via the repo's quantization_config. LoRA adapters whose base is a pre-quantized repo on old bnb will still fail inside Unsloth's loader -- the only real fix there is `unsloth studio update`. Verified on MI300X (gfx942, ROCm 7.2, torch 2.10.0+rocm7.1): - HAPPY path (bnb 0.50.0.dev0, load_in_4bit=True, pre-quantized repo): loads in 4-bit via the fixed GEMV, generation returns "Paris." for greedy and sampling. - SAFETY-NET path (simulated old bnb, suffix-stripped to the FP16 sibling, load_in_4bit=False): loads in bf16, generation returns "Paris." for greedy and sampling. Net diff is ~45 lines smaller than the pre-revert state because the entire plain-transformers 16-bit branch is gone. * Cache _bnb_rocm_4bit_ok() with functools.cache load_model() can be called many times in a single session but the bnb version and hardware state cannot change at runtime, so memoise the check. First call is ~1.9 ms (dominated by the lazy `import bitsandbytes` inside the try block), subsequent calls drop to sub-microsecond dict lookups. Zero behavioral change. * Shorten verbose bnb/ROCm comments Comment-only cleanup across install.sh, studio/install_python_stack.py, and studio/backend/core/inference/inference.py. No behavioral change. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove _bnb_rocm_4bit_ok safety net from inference.py Studio's ROCm support is brand new (PR #4720, merged today) and every fresh install pulls the bnb continuous-release_main wheel via install.sh / install_python_stack.py in this same PR. There are no existing ROCm Studio installs carrying bnb < 0.50, so the defensive version-check fallback is guarding against a scenario that cannot actually occur. Delete the helper, the functools import, and the safety-net block -- inference.py now calls FastLanguageModel.from_pretrained directly with no ROCm branching. * Drop audio/vision ROCm guard in inference.py — verified unblocked by bnb fix Vision inference was blocked by the same bnb 4-bit GEMV bug that affected text inference (vision models use bnb 4-bit for the LM backbone). With bnb 0.50+ pinned in install.sh / install_python_stack.py, vision works end-to-end on MI300X: Llama-3.2-11B-Vision-Instruct-unsloth-bnb-4bit loaded in 4-bit via FastVisionModel + for_inference returns a correct answer to a multimodal prompt. Audio (CSM) was never actually blocked by HIP — on this hardware CSM loads and runs its backbone forward pass fine with bnb 0.50, then fails during generate() with a transformers-level kwarg validation mismatch in generation_csm.py (`backbone_last_hidden_state` rejected). That's a pre-existing transformers/CSM integration bug that reproduces identically on NVIDIA, so the ROCm-gated guard was never actually protecting users from anything HIP-specific. Remove the combined audio/vision guard and the now-unused _hw_module import. Also restore the one-word "Can be" in an inline comment that drifted during the earlier comment-shortening pass, so the inference.py delta vs pre-#4720 is exactly the max_seq_length<=0 crash fix and nothing else. * Shorten max_seq_length=0 guard comment to one line --------- Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-10 06:25:39 -07:00
Daniel Han	cad8c6ad05	Add AMD ROCm/HIP support across installer and hardware detection (#4720 ) * Add ROCm detection to install.sh and expand shell tests Add AMD ROCm GPU detection to get_torch_index_url() in install.sh. When nvidia-smi is not found, probe for ROCm via amd-smi, /opt/rocm version file, hipconfig, dpkg-query, and rpm. Includes validation guard for malformed _rocm_tag, Debian epoch prefix stripping, ROCm 7.2+ cap to rocm7.1 index, bitsandbytes AMD install, and status messaging. Shell tests expanded to 23 cases. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm torch reinstall support to install_python_stack.py Add _detect_rocm_version() and _ensure_rocm_torch() to detect when a Linux host has ROCm but the venv received CPU-only torch, and reinstall with the correct ROCm wheels. Covers ROCm 6.0 through 7.1 with a 30-second timeout on the torch GPU probe subprocess. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add ROCm support to llama.cpp prebuilt installer Add has_rocm field to HostInfo, extend detect_host() to probe for ROCm via hipcc/amd-smi/rocm-smi/ROCM_PATH, and route ROCm hosts to upstream prebuilts (Linux ROCm 7.2 prebuilt with source fallback, Windows HIP prebuilt with CPU fallback). Add linux-rocm and windows-hip install kinds to runtime_patterns_for_choice(). Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add IS_ROCM hardware flag and fix AMD error message Add IS_ROCM flag to hardware.py detect_hardware() (set when torch.version.hip is present, DeviceType stays CUDA). Export IS_ROCM from __init__.py. Add "rocm" key to get_package_versions(). Replace "We do not support AMD" error in tokenizer_utils.py with a helpful message pointing to ROCm installation docs. Co-authored-by: Daniel Han <danielhanchen@gmail.com> * Add comprehensive ROCm support test suite (68 tests) Add tests/studio/install/test_rocm_support.py covering all ROCm code paths across install_llama_prebuilt.py, install_python_stack.py, hardware.py, tokenizer_utils.py, and install.sh. All tests use mocks and run without AMD hardware. Covers: asset selection (11), runtime patterns (5), HostInfo (4), ROCm version detection (9), torch reinstall (9), index mapping (8), hardware flag (8), tokenizer message (2), install.sh structure (10), and live regression (1). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm support: probe error handling, version cap, validation Address review findings from 8 independent reviewers: - Wrap _ensure_rocm_torch() torch probe in try/except for TimeoutExpired and OSError so a hung or broken torch import does not crash the installer (8/8 reviewers flagged this) - Add torch>=2.4,<2.11.0 version cap to the ROCm reinstall path to prevent installing unsupported torch 2.11.0 from the rocm7.1 index - Use with-statement for file reads in _detect_rocm_version() to avoid resource leaks - Handle ROCM_PATH="" correctly (use `or "/opt/rocm"` instead of default parameter to avoid relative path resolution) - Strengthen shell validation guard from rocm[0-9] to rocm[1-9] to reject rocm0.x tags that would produce nonexistent PyTorch index URLs - Switch shell version cap from blocklist to allowlist (rocm6.\|rocm7.0 \|rocm7.1* pass through, everything else caps to rocm7.1) so future ROCm 10+ does not fall through to a nonexistent index - Add sorted() to _ROCM_TORCH_INDEX lookup for defensive ordering - Fix test_probe_timeout_handled: replace zero-assertion test with proper assertions verifying reinstall proceeds after timeout * Clean up rocm_paths list construction in detect_host() Filter None from the ROCM_PATH env var lookup at list construction time instead of relying on the inline `if p` guard in the any() call. * Require actual AMD GPU presence before selecting ROCm paths All 8 reviewers across 2 cycles independently flagged that ROCm detection used toolkit/filesystem hints (hipcc, /opt/rocm, rocm-core) as a proxy for GPU presence, which would misroute CPU-only or NVIDIA hosts that happen to have ROCm tools installed. Now all 3 detection points (install.sh, install_python_stack.py, install_llama_prebuilt.py) probe for an actual AMD GPU before entering the ROCm path: - install.sh: check rocminfo for gfx* GPU names, or amd-smi list for device rows, before version detection - install_python_stack.py: new _has_rocm_gpu() function probes rocminfo and amd-smi list before _ensure_rocm_torch() proceeds - install_llama_prebuilt.py: detect_host() probes rocminfo/amd-smi list instead of just checking tool existence or directory paths Also: - Shell test mock amd-smi now handles "list" subcommand - Python tests updated to mock _has_rocm_gpu where needed - Added test_no_gpu_with_rocm_tools_skips to verify the new guard - Test index lookups now use sorted() to match production code * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden hipconfig version parsing and torch probe compatibility - Add parts[1].isdigit() check in hipconfig version parsing to handle versions like "6.3-HIP" where the minor component has non-numeric suffix (strip "-" prefix before int() conversion) - Use getattr() in torch probe subprocess to safely handle old or custom torch builds that may lack torch.version.hip/cuda attributes * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Strengthen AMD GPU detection and add NVIDIA precedence guard - Change amd-smi list detection from any-non-empty-output to requiring "gpu" marker in output, matching the shell-side NR>1 check. Prevents false positives from header-only amd-smi list output. - Add nvidia-smi check at the top of _ensure_rocm_torch() so mixed AMD+NVIDIA hosts preserve NVIDIA precedence (matching install.sh and install_llama_prebuilt.py behavior). - Apply the same amd-smi marker fix to install_llama_prebuilt.py detect_host() for consistency. * Add Windows-specific ROCm/HIP detection in detect_host() The previous detect_host() ROCm check used rocminfo and amd-smi list which are Linux-only tools. On Windows, has_rocm would always be False, making the Windows HIP prebuilt path at line 1794 unreachable. Now detect_host() uses platform-specific detection: - Linux: rocminfo (check for gfx GPU names) or amd-smi list - Windows: hipinfo.exe, amd-smi, or amdhip64.dll on PATH This allows Windows AMD users to get the HIP prebuilt binary instead of silently falling through to the CPU prebuilt. * Add AMD ROCm gaps: Mamba/SSM source builds, GPU monitoring, Windows messaging, RDNA expansion - worker.py: Add HIP detection to causal-conv1d/mamba-ssm probe, check for hipcc before ROCm source builds, improve status messages and error reporting, add timeout and uv support for the source build fallback - amd.py: New AMD GPU monitoring module via amd-smi metric --json, mirroring nvidia.py structure (utilization, temperature, power, VRAM) - hardware.py: Branch to amd.py when IS_ROCM is True for GPU utilization, visible GPU queries, and physical GPU count - install_python_stack.py: Detect AMD GPUs on Windows and warn that ROCm-enabled PyTorch must be installed manually - kernels/utils.py: Expand is_rdna() to cover RDNA2 (gfx1030-1032), RDNA3 (gfx1102-1103), RDNA3.5 (gfx1150-1152) alongside existing entries - tests: Add 32 new tests covering all changes (95/95 pass) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden ROCm detection, fix VRAM heuristic, and expand RDNA2 coverage - Windows ROCm detection: validate actual GPU presence via hipinfo/amd-smi output markers instead of just checking tool existence on PATH - _ensure_rocm_torch: validate nvidia-smi actually reports a GPU before giving NVIDIA precedence (fixes AMD-only hosts with stale NVIDIA tools) - amd.py _parse_numeric: handle dict-shaped metric objects from newer amd-smi versions ({"value": 10, "unit": "W"}) and strip MiB/GiB units - amd.py VRAM heuristic: raise threshold from 100k to 10M to correctly handle MI300X (192 GB = 196608 MB) and other high-VRAM GPUs - amd.py visible GPU: use AMD-reported GPU IDs instead of enumerate index so non-dense sets like CUDA_VISIBLE_DEVICES=1,3 report correctly - install.sh: add ROCm <6.0 minimum version guard (no PyTorch wheels exist for older versions); fix rocm7.1* glob to not match rocm7.10+ - is_rdna: add gfx1033-1036 for RDNA2 mobile GPUs (RX 6600M etc.) - worker.py: increase ROCm source build timeout from 600s to 1800s; fix success log message for ROCm source builds - Tests: update mocks for _has_usable_nvidia_gpu, add RDNA2 target asserts * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add HIP_VISIBLE_DEVICES support, unit-aware VRAM parsing, Windows GPU validation - hardware.py: check HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm before falling back to CUDA_VISIBLE_DEVICES, so multi-GPU AMD setups with HIP-specific env vars report the correct visible device set - amd.py: add _parse_memory_mb() that reads "unit" from dict-shaped amd-smi JSON (e.g. {"value": 192, "unit": "GiB"}) and converts to MB correctly; fixes MI300X VRAM misreported as 0.19 GB instead of 192 GB - install_python_stack.py: Windows AMD warning now validates actual GPU presence via hipinfo/amd-smi output markers before printing - install_llama_prebuilt.py: restore amdhip64.dll fallback for Windows HIP detection after tool-based checks, so Windows HIP installs without CLI tools on PATH are still detected - hardware.py: fix IS_ROCM comment to accurately describe its role * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix HIP_VISIBLE_DEVICES empty-string handling in GPU visibility spec Use explicit None checks instead of Python `or` operator when reading HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES, so that an empty string ("") is correctly honored as "no visible GPUs" rather than silently falling through to CUDA_VISIBLE_DEVICES on mixed ROCm+CUDA systems. * Fix IS_ROCM test assertion for multi-line formatting * Cap torchvision/torchaudio versions, remove amdhip64.dll fallback, fix visible GPU count - Cap torchvision<0.26.0 and torchaudio<2.11.0 alongside torch<2.11.0 in both install.sh and install_python_stack.py to prevent resolver from selecting incompatible companion packages from ROCm wheel index - Remove amdhip64.dll fallback in Windows ROCm detection (DLL presence without hipinfo/amd-smi is not proof of GPU existence) - Fix get_visible_gpu_count() to use _get_parent_visible_gpu_spec() which respects HIP_VISIBLE_DEVICES/ROCR_VISIBLE_DEVICES on ROCm hosts * Attribute is_rdna() RDNA2/3/3.5/4 expansion to PR #4428 The is_rdna() expansion to cover RDNA2 (gfx1030-1036), RDNA3 (gfx1100-1103), RDNA3.5 (gfx1150-1152), and RDNA4 (gfx1200-1201) architectures is based on the original work from PR #4428. Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: billishyahao <bill.he@amd.com> * Support AMD Radeon for studio (#4770) Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> * Remove ROCm test files from main PR Move test_rocm_support.py and shell test additions to a separate PR to keep the main ROCm support PR focused on implementation changes. * Fix installer and hardware detection issues for PR #4720 - Fix empty _tri_arg passed to uv pip install in Radeon path (causes "Empty field is not allowed for PEP508" error) - Fix Radeon fallback: use ROCm index instead of CPU-only when repo.radeon.com is unreachable (TORCH_INDEX_URL already has ROCm) - Use $TORCH_CONSTRAINT in fallback paths instead of hardcoded strings - Fix _pick_radeon_wheel: relax suffix to match manylinux_2_28_x86_64 wheels (AMD Radeon repo does not use bare linux_x86_64 platform tag) - Fix IS_ROCM export: use __getattr__ so callers always see the live value after detect_hardware() runs - Fix apply_gpu_ids: set HIP_VISIBLE_DEVICES and ROCR_VISIBLE_DEVICES on ROCm so _get_parent_visible_gpu_spec picks up narrowed GPU set - Fix _parse_memory_mb: distinguish GB (1000 MB) from GiB (1024 MiB) - Add amd-smi version as a fallback in _detect_rocm_version - Fix trailing whitespace and missing newline at EOF in install.sh * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix GPU detection false positives and add missing health groups - Fix _has_rocm_gpu() false positive: require "GPU: <number>" data rows from amd-smi list, not just header containing "gpu" - Apply same fix in detect_host() in install_llama_prebuilt.py - Add runtime_payload_health_groups for linux-rocm and windows-hip so partial/corrupt ROCm/HIP prebuilt installs are properly detected - Add bitsandbytes install to Radeon fallback paths (was only in the success path, skipped when repo.radeon.com was unreachable) - Keep DEVICE/CHAT_ONLY as direct imports in __init__.py (matching main) and only use __getattr__ for IS_ROCM * Fix _ensure_rocm_torch and Windows AMD warning false positives - _ensure_rocm_torch: only skip when HIP is already present, not for CUDA builds (which are unusable on AMD-only hosts). Fixes the case where a venv has a stale CUDA wheel and the repair step is skipped. - Windows AMD warning: use GPU data row check (same as Linux fix) to avoid false positives from amd-smi list header-only output. * Fix amd-smi GPU detection for GPU[N] output format Older amd-smi versions output "GPU[0] : Card series: ..." instead of "GPU: 0". The regex now matches both "GPU: <digit>" and "GPU[<digit>" formats to detect actual GPU data rows. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Harden AMD GPU detection against false positives - install.sh: replace weak amd-smi list check (awk 'NR>1 && NF') with strict pattern matching GPU data rows (/^GPU[[:space:]][:\[]/) - All files: reject rocminfo gfx000 (CPU HSA agent) by requiring gfx[1-9] instead of gfx[0-9] in the rocminfo GPU probe - Fixes false positives on hosts with ROCm tools but no AMD GPU Remove duplicate comment from pre-commit merge * Refactor: deduplicate AMD detection, consolidate bitsandbytes, clean up imports - Extract _has_amd_rocm_gpu() shell function to avoid duplicating the rocminfo/amd-smi GPU detection logic in get_torch_index_url and the Radeon auto-detect block - Consolidate bitsandbytes install into a single case block after torch install (was duplicated 4 times across Radeon success/fallback paths) - Move math and re imports to top of amd.py (were inline in functions) - Add _smi_query() helper in hardware.py to centralize IS_ROCM backend selection for get_gpu_utilization and get_visible_gpu_utilization Addresses Gemini code review suggestions. * Fix VRAM parsing for string values and GB/GiB consistency - Extract unit from string-valued VRAM fields (e.g. "192 GiB") so _parse_memory_mb correctly applies the unit multiplier instead of treating the value as bare MB - Treat GB and GiB identically (both as binary x1024) since GPU tools including amd-smi use binary units even when labeling them "GB" - Fixes incorrect VRAM reporting on MI300-class cards (was showing ~0.19 GB instead of 192 GB for string-valued outputs) * Add --no-cache to uv for ROCm HIP source builds Avoid stale cache artifacts from partial HIP source builds when uv is used for causal-conv1d/mamba-ssm compilation on ROCm. The pip path already uses --no-cache-dir; this adds the uv equivalent (--no-cache) only when is_hip is True. * Fix critical: initialize _amd_gpu_radeon before case block _amd_gpu_radeon was only set inside the /rocm) case arm, so on NVIDIA/CPU/macOS paths where TORCH_INDEX_URL does not contain "rocm", the variable was unbound. With set -u (nounset) enabled, this crashes the installer for every non-AMD user. Move initialization to before the case block so it is always defined. * Fix Windows AMD: route has_rocm hosts to HIP prebuilt path resolve_release_asset_choice was selecting windows-cpu for all Windows x86_64 hosts including those with has_rocm=True. Windows AMD users should fall through to resolve_upstream_asset_choice which tries the HIP prebuilt first. Add "not host.has_rocm" guard to the published windows-cpu selection. * Harden ROCm detection, Radeon wheel fallback, and HIP visibility Addresses review findings from parallel reviewers on PR #4720: - install.sh: add _has_usable_nvidia_gpu() helper requiring nvidia-smi -L to actually list a GPU before treating the host as NVIDIA. Fixes the stale-nvidia-smi-on-PATH regression where AMD-only hosts fell into the CUDA branch. - install.sh: fix hipconfig awk blocks to propagate a non-zero exit code when the output is not a recognisable version string, so the \|\|-chain continues to dpkg-query / rpm instead of terminating early. - install.sh: fail-closed on Radeon wheel fallback. When torch, torchvision or torchaudio is missing from the Radeon repo for the active Python tag, fall back to the standard ROCm index instead of silently mixing Radeon wheels with PyPI defaults. Quote all wheel arguments individually so wheel filenames cannot be word-split or glob-expanded. - install_llama_prebuilt.py: detect_host() now requires nvidia-smi -L to list a GPU before setting has_physical_nvidia. Routes AMD ROCm hosts with a broken leftover nvidia-smi to the ROCm path instead of misclassifying them as NVIDIA. - install_llama_prebuilt.py: scan upstream assets for any rocm-<version> prebuilt instead of hard-coding rocm-7.2, so ROCm 6.x / 7.0 / 7.1 / 7.3+ users pick up a matching upstream prebuilt when one exists. - install_llama_prebuilt.py: validate_server() adds --n-gpu-layers 1 for linux-rocm and windows-hip hosts, so new HIP prebuilts are preflighted on the GPU path instead of passing validation on CPU only. - install_llama_prebuilt.py: restore the published windows-cpu fallback for AMD Windows hosts without a HIP prebuilt so hash-approved bundles are still preferred over the raw upstream CPU asset. - install_python_stack.py: drop the /opt/rocm / hipcc gate in _ensure_rocm_torch() and rely on _has_rocm_gpu(). Runtime-only ROCm installs (package-managed minimal installs, Radeon software) that ship amd-smi / rocminfo without hipcc can now repair a CPU-only venv via "unsloth studio update". Adds an explicit IS_WINDOWS / IS_MACOS guard. - studio/backend/utils/hardware/amd.py: honour HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES / CUDA_VISIBLE_DEVICES in get_primary_gpu_utilization(). A process restricted to GPU 2 now reports metrics for GPU 2 instead of physical GPU 0. Tighten the plain bytes unit detection to an explicit allowlist. - studio/backend/utils/hardware/hardware.py: route get_backend_visible_gpu_info()'s backend_cuda_visible_devices field through a helper that reads HIP_VISIBLE_DEVICES on ROCm. Drop the unconditional "(rocm=False)" suffix in apply_gpu_ids() logs. * Fix round 2 regressions: ROCm validate_server and Windows HIP routing Follow-up to `810b833b` addressing review findings on the first round of hardening commits: - install_llama_prebuilt.py validate_server: gate --n-gpu-layers on the resolved install_kind instead of host.has_rocm. AMD Windows hosts without a HIP prebuilt fall back to windows-cpu and must not be validated with GPU layers; thread install_kind through from the caller. - install_llama_prebuilt.py resolve_release_asset_choice: reinstate the "not has_rocm" guard on the published windows-cpu bundle so AMD Windows hosts reach resolve_upstream_asset_choice() where the new HIP prebuilt path lives. Prefer a published windows-hip bundle first when one exists, fall through to upstream HIP + upstream CPU otherwise. - install_llama_prebuilt.py detect_host: also set has_physical_nvidia when the secondary --query-gpu block confirms a working NVIDIA GPU, so older nvidia-smi versions without -L support do not silently skip the Linux diagnostics that key off has_physical_nvidia. - install_llama_prebuilt.py: drop redundant "import re as _re" / "import re as _re_rocm" local aliases in favour of the existing top-level "import re". - install_python_stack.py _ensure_rocm_torch: run the AMD bitsandbytes install unconditionally after the HIP-torch probe so "unsloth studio update" on venvs that already have ROCm torch still gains the AMD bitsandbytes build. - install.sh: add a non-x86_64 early-exit to get_torch_index_url() so aarch64 / arm64 Linux hosts do not hit the ROCm wheel index (PyTorch only publishes ROCm wheels for linux_x86_64). - install.sh: add bitsandbytes install to the migrated-environment branch so upgrades pick it up for ROCm hosts instead of only the fresh-install path. - install.sh: in the Radeon wheel path, pass version constraints + --no-index --find-links to uv instead of explicit wheel URLs so a version-compatible torch / torchvision / torchaudio triple is resolved, rather than picking the highest-version wheel for each package independently. - studio/backend/utils/hardware/amd.py _first_visible_amd_gpu_id: fall through to lower-priority visibility env vars when the first entry is malformed (leading comma, all-whitespace first token) instead of silently returning GPU 0. * Fix round 3 findings: x86_64 guard, ROCm version clip, Radeon deps Address issues surfaced by the round 3 reviewers on top of `8636fa63`: - install_python_stack.py _ensure_rocm_torch: add the same `x86_64` guard that install.sh already has. Linux aarch64 / arm64 ROCm hosts must skip the repair path entirely; PyTorch only publishes ROCm wheels for linux_x86_64, and without this guard `unsloth studio update` aborts with a missing-wheel error on non x86_64 hosts. - install_llama_prebuilt.py resolve_upstream_asset_choice: add a best-effort _detect_host_rocm_version() helper (reading /opt/rocm/.info/version, amd-smi version, hipconfig --version) and filter rocm_candidates to entries whose major.minor is <= host version. Falls back to the newest candidate only when no compatible one exists, so a ROCm 6.4 host downloads rocm-6.4 instead of being handed the numerically newest rocm-7.2 bundle (which fails preflight and forces a source build). - install.sh: remove the round 2 --no-index switch from the Radeon wheel branch. --no-index forced uv to ignore PyPI entirely, which broke transitive dependency resolution (filelock, sympy, networkx, jinja2, fsspec, setuptools, typing-extensions, ...) on a fresh venv. Restore the round 1 explicit wheel URL invocation but add a torch / torchvision / torchaudio version-pair sanity check so a mismatched trio (e.g. torch 2.9.1 + torchvision 0.23.0 + torchaudio 2.9.0) falls back to the standard ROCm index instead of installing a broken combination. - install_python_stack.py _ensure_rocm_torch: restructure the "tag is None" path so it no longer short-circuits the bitsandbytes install. On a ROCm runtime older than anything in _ROCM_TORCH_INDEX, print the "no wheel" warning but still run the AMD bitsandbytes install. - studio/backend/core/training/worker.py: restore the pre-PR "no timeout" behaviour for non-HIP causal-conv1d / mamba-ssm source builds. The round 2 "timeout = 1800 if is_hip else 300" cap aborts slow non-HIP builds (Linux aarch64, unsupported torch/CUDA combos) after 5 minutes; omit timeout for the non-HIP branch so the cap only applies to ROCm source builds. * Fix round 4 findings: apply_gpu_ids env inheritance, Radeon X.Y, bitsandbytes gate Address remaining issues surfaced by the round 4 reviewers: - studio/backend/utils/hardware/hardware.py apply_gpu_ids: mirror the selection into HIP_VISIBLE_DEVICES / ROCR_VISIBLE_DEVICES whenever the caller already had a ROCm visibility env var set, not only when IS_ROCM has already been set by detect_hardware(). Training and inference workers call apply_gpu_ids() before detect_hardware() runs, so the old guard would leave a forked ROCm worker with a stale HIP_VISIBLE_DEVICES mask that no longer matched the narrowed CUDA_VISIBLE_DEVICES selection. - install.sh get_radeon_wheel_url: accept X.Y ROCm versions in addition to X.Y.Z. The `/opt/rocm/.info/version` file and some hipconfig versions report only two components, and the Radeon repository publishes both rocm-rel-X.Y.Z/ and rocm-rel-X.Y/ directories, so treating X.Y as invalid caused Radeon hosts to fall back to the generic ROCm index even when a matching AMD wheel set existed. - install_python_stack.py _ensure_rocm_torch: only install the AMD bitsandbytes build when the venv actually has a ROCm-compatible torch (either already present or just installed by this function). Previously the bitsandbytes install ran unconditionally, which could leave an AMD bitsandbytes layered on top of a CPU/CUDA torch on hosts where the ROCm runtime is older than any entry in _ROCM_TORCH_INDEX. Also add --force-reinstall so an existing CPU/CUDA bitsandbytes is replaced by the AMD build during upgrades. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix gemini findings: amd-smi metric envelope validation and dict-wrapped GPU id Two medium-severity defensive fixes from the gemini-code-assist review on the AMD monitoring backend: 1. _extract_gpu_metrics may return a dict where every value is None when amd-smi succeeds (zero exit) but the JSON envelope contains no usable fields (error response, unsupported card). The new _has_real_metrics helper lets get_primary_gpu_utilization surface available:False and lets get_visible_gpu_utilization skip ghost device rows so the UI does not render placeholder cards with empty numbers. 2. Newer amd-smi versions wrap scalar fields as {"value": 0, "unit": "none"}, including the per-GPU id. The previous int(raw_id) call silently fell back to the enumeration index in that case, losing the real GPU id. Routing raw_id through the existing _parse_numeric helper handles bare ints, floats, strings, and the dict shape uniformly, with a debug log on parse failure. * Fix gemini round 2 findings: explicit length guard on ROCm version file parser Both _detect_rocm_version (install_python_stack.py) and _detect_host_rocm_version (install_llama_prebuilt.py) read /opt/rocm/.info/version or $ROCM_PATH/lib/rocm_version, split on "." and unconditionally accessed parts[1]. The surrounding broad `except Exception: pass` already swallowed the resulting IndexError, so a one-component file like "6\n" did fall through to the next detection source -- but the control flow relied on exception handling instead of an explicit check. Add `if len(parts) >= 2:` guards in both helpers so the loop falls through on its own without raising. Behaviour is unchanged for the common multi- component case; the previously-silent IndexError path becomes an explicit no-op. * Fix gemini round 3: include has_rocm in validate_server fallback path When validate_server is called without an explicit install_kind (older call sites that have not been updated), the fallback was only enabling --n-gpu-layers for NVIDIA and macOS arm64 hosts. AMD ROCm Linux hosts fell through to the CPU validation path even though the prebuilt being exercised was a HIP binary. Add host.has_rocm to the fallback expression so the GPU offload flag is applied consistently with the install_kind=='linux-rocm' / 'windows-hip' branches above. * Fix gemini round 4: remove risky bytes-vs-MB heuristic in _parse_memory_mb The previous heuristic divided any bare number above 10_000_000 by 10241024 on the assumption that large unit-less values were bytes. This misclassified small VRAM allocations: 5 MB of used VRAM reported as 5_242_880 bytes without a unit would be taken at face value and render as 5_242_880 MB (~5 TB) in the monitoring UI. Modern amd-smi always provides explicit units (MiB/GiB dict form), and legacy amd-smi returns bare numbers in MB -- the heuristic never had a real workload to handle. Drop it and default to MB for bare numeric input, keeping the existing unit-aware branches for dict / string inputs unchanged. The unrelated gemini suggestion to "default minor to 0" in the amd-smi version awk parser was intentionally NOT applied: rocm7.0 and rocm7.1 ship different wheel sets, so silently substituting 0 for a missing minor could install the wrong wheels. The existing reject-and-fall-through behaviour is safer. Fix gemini round 5: POSIX compliance and leading-comma visibility parsing Three medium findings from gemini-code-assist addressed in this commit: 1. _pick_radeon_wheel used grep -o and sort -V, both GNU extensions that are not in POSIX and break on BSD/BusyBox coreutils. install.sh has a #!/bin/sh shebang so the whole pipeline was rewritten as a single awk script that extracts all href="..." hits on each line, filters to wheels matching the package prefix and python tag, and picks the newest version via zero-padded lexical comparison. No external sort or grep is needed. 2. _first_visible_amd_gpu_id in the AMD monitoring backend treated a leading comma (e.g. HIP_VISIBLE_DEVICES=",1") as "fall through to the next env var", which is surprising given the clear intent to narrow to device 1. Filter empty tokens after the split and return the first real one. An all-commas value ("," / ",,,") still falls through because no real tokens exist; the empty-string and "-1" explicit-zero cases are unchanged. The unrelated amd-smi version awk parser suggestion was not applied (see round 4 commit message for rationale: defaulting a missing minor to 0 could silently install the wrong ROCm wheel set). * Fix 20-reviewer.py findings: base drift, Radeon %2B, dpkg/rpm fallback, bnb, backend label Consolidated fix batch from a 20-parallel reviewer.py run on the current head. Each fix is drawn from a high-consensus finding and addresses a real bug or feature gap, not a stylistic preference. 1. install.sh: bump `unsloth>=2026.4.2` -> `unsloth>=2026.4.4` at five call sites so this branch no longer regresses main's version floor (main bumped to 2026.4.4 in #4876). Without this, merging 4720 would silently downgrade the minimum version pin for fresh installs. 2. install.sh: URL-decode Radeon wheel names before extracting the torch / torchvision / torchaudio version strings. Real wheel URLs from repo.radeon.com are percent-encoded ("torch-2.10.0%2Brocm7.2.0...") so the previous `[+-]` terminator in the sed regex never matched, `_torch_ver` stayed empty, `_radeon_versions_match` stayed false, and every Radeon consumer install silently fell back to the generic ROCm index. Now decode %2B -> + first, then extract, then validate. 3. install.sh: the two AMD bitsandbytes install lines were running `uv pip install "bitsandbytes>=0.49.1"` without `--force-reinstall`, so upgrades where the venv already has a CPU/CUDA bitsandbytes satisfying the constraint would keep the stale non-AMD wheel. Add `--force-reinstall --no-cache-dir` to both call sites, matching the pattern already used in install_python_stack.py::_ensure_rocm_torch. 4. install_python_stack.py and install_llama_prebuilt.py: add `dpkg-query -W rocm-core` and `rpm -q rocm-core` fallbacks to the Python-side ROCm version detectors so they match the chain in install.sh::get_torch_index_url. Package-managed ROCm installs (Debian/Ubuntu/RHEL/Fedora distro packages) can expose GPUs via rocminfo/amd-smi but still lack /opt/rocm/.info/version, hipconfig, or amd-smi `version` output -- without these fallbacks, `unsloth studio update` on such hosts returned None and skipped the ROCm torch repair. Also strip the dpkg epoch prefix ("1:6.3.0-1") before parsing so epoch-annotated packages parse correctly. 5. hardware.py: add a `_backend_label(device)` helper that returns "rocm" when IS_ROCM is set and the device is DeviceType.CUDA, and use it for every `"backend": ...` emission in JSON responses served to the Studio frontend. Internally we still represent ROCm hosts as DeviceType.CUDA (ROCm torch reuses the whole torch.cuda.* API surface), but the user-facing API now correctly reports "rocm" on AMD boxes instead of labeling them as "cuda". All 250 simulation scenarios pass (was 233 before this batch: added 17 new regression tests covering the version pin, %2B decoding, bnb force-reinstall flags, dpkg/rpm fallback presence, and the _backend_label helper's four-way truth table). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix gemini round 6 + URL audit: amd.py defensive checks, rocm6.5+ clip to 6.4 Two rounds of fixes in one commit, plus a full URL audit of every PyPI / download.pytorch.org / repo.radeon.com reference the PR introduces. amd.py (4 medium gemini findings on commit `b3627bc2`): 1. _extract_gpu_metrics used `and vram_total_mb` as part of the vram_util gate. The follow-up `vram_total_mb > 0` already handles the division guard, but the truthiness check was redundant and slightly surprising for a 0.0 valid value. Replace with explicit `is not None and > 0` for both vram_util and power_util. 2. get_physical_gpu_count called `data.get("gpu", ...)` without guarding for non-dict envelopes. A scalar / string JSON response from amd-smi would raise AttributeError. Add an isinstance(data, dict) check and return None for unexpected shapes. 3. get_visible_gpu_utilization had the same .get() exposure on the outer envelope. Rewrite the gpu_list extraction as an explicit list/dict/else cascade so a malformed scalar envelope produces gpu_list=[data] and continues without raising. 4. The same function's per-entry loop also called gpu_data.get() on whatever was inside gpu_list. If a scalar ever leaks into the list (directly or via the previous fix's fallback), _extract_gpu_metrics would raise on the first .get() inside the helper. Skip non-dict entries in the loop before extracting metrics. install.sh (URL audit finding, previously flagged by 20-reviewer as #13): 5. get_torch_index_url used `rocm6.` in the rocm tag case statement, which matched rocm6.5 and rocm6.6 and emitted download.pytorch.org/whl/rocm6.5 -- which returns HTTP 403 because PyTorch only publishes rocm 5.7, 6.0-6.4, 7.0-7.2. Enumerate the supported 6.x minors explicitly and add a rocm6. fallback branch that clips to rocm6.4 (the last supported 6.x wheel set). URL audit results (all URLs PR 4720 references): - 14/14 download.pytorch.org/whl/{cpu,cu118,cu124,cu126,cu128,cu130, rocm6.0..6.4,rocm7.0..7.2} return HTTP 200. - 9/9 repo.radeon.com/rocm/manylinux/rocm-rel-{5.7,6.0,6.1,6.2,6.3, 6.4,7.0,7.1,7.2}/ return HTTP 200. - X.Y.Z patch directories exist for 7.0.2, 7.1.1, 7.2.1 but NOT for 6.3.0, 6.4.0, 6.2.1 -- install.sh already handles this via the X.Y.Z -> X.Y fallback sed in the Radeon wheel install block. - Docs links (rocm.docs.amd.com, docs.unsloth.ai AMD guide) and the llama.cpp GitHub releases API endpoint all return 200. Test suite: 255 -> 258. New regression coverage: - U17: get_physical_gpu_count tolerates scalar amd-smi envelope - U18: get_visible_gpu_utilization tolerates scalar envelope - U19a-c: vram_util / power_util return None on zero total, but vram_total_gb still echoes 0.0 (not None) - A_rocm{6.5,6.6,6.9}_clips_to_rocm64: install.sh clips unsupported 6.x minors to rocm6.4 instead of producing a 403 index URL * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix reviewer.py round 2: tokenizer AMD multi-GPU, --no-torch bnb, main.py backend label Three high-confidence findings from a second 20-parallel reviewer.py run on commit `7effb3ae`. Triaged 15 total findings and applied the three that were confirmed as real bugs; the rest were either false positives (e.g. "migrated AMD venv not repaired" -- _ensure_rocm_torch runs downstream via setup.sh regardless), design decisions (e.g. visibility mask env vars not consulted in installer detection), or edge cases the existing fallback logic already handles. 1. unsloth/tokenizer_utils.py [6/20]: the multi-GPU guard's shell probe runs `nvidia-smi --query-gpu=memory.used`, catches the failure, then only raises if `torch.cuda.is_available()` is False. On ROCm torch, torch.cuda.is_available() returns True (ROCm reuses the torch.cuda.* API), so the guard becomes dead code on AMD hosts and multi-GPU AMD setups slip through even though unsloth does not support them yet. Add a torch.cuda.device_count() > 1 fallback inside the except so AMD multi-visible-device setups are flagged consistently with the original CUDA memory check. 2. install.sh [1/20]: the fresh-install bitsandbytes block for AMD ROCm ran unconditionally when TORCH_INDEX_URL matched `/rocm`, even when SKIP_TORCH=true (from --no-torch or Intel Mac auto-detect). A user running `install.sh --no-torch` on an AMD host would still pull in bitsandbytes despite explicitly asking for GGUF-only mode. Wrap the case block in an outer `[ "$SKIP_TORCH" = false ]` guard. 3. studio/backend/main.py [3/20]: the /api/system endpoint returned `"device_backend": get_device().value`, which is "cuda" on ROCm hosts (because ROCm torch piggybacks on torch.cuda). Other endpoints (hardware.py) already use the _backend_label helper which swaps "cuda" -> "rocm" when IS_ROCM. Route /api/system through the same helper so the Studio UI reports the backend consistently across all endpoints. 4. studio/backend/tests/test_utils.py: update test_backend_matches_device to call _backend_label(get_device()) instead of raw get_device().value so the test matches the new contract and still passes on CUDA hosts. Tests: 258 -> 261. New regression coverage: - X08 main.py /api/system uses _backend_label - X09 tokenizer multi-GPU guard has device_count() fallback - X10 fresh-install bnb case block gated on SKIP_TORCH=false * fix: prevent bitsandbytes from overwriting ROCm torch with CUDA wheels During install, bitsandbytes was installed without --no-deps, causing uv to resolve torch from PyPI (CUDA build) and silently overwrite the ROCm wheels that were just installed in the previous step. This happened in three places: - install.sh: bitsandbytes install in both migrated and fresh paths - install_python_stack.py: bitsandbytes install inside _ensure_rocm_torch() Additionally, multiple install steps in install_python_stack.py (extras, overrides, studio deps) can pull in CUDA torch via transitive dependencies. A final _ensure_rocm_torch() call at the end of the install sequence ensures ROCm torch is always in place at runtime. All changes are gated behind ROCm-specific conditions and do not affect NVIDIA, CPU-only, macOS, or Windows install paths. Tested on AMD Instinct MI300X VF with ROCm 7.2.0 -- confirms torch==2.10.0+rocm7.1 with HIP 7.1.25424 after install. * fix: ROCm inference fallback -- skip Unsloth patching and bnb 4-bit on HIP On AMD ROCm (HIP), two issues prevent the normal Unsloth inference path: 1. Unsloth's global monkey-patching of transformers model classes (LlamaRotaryEmbedding, attention modules) triggers _assert_async_cuda_kernel crashes on HIP during generation. Training uses different code paths and works fine. 2. bitsandbytes 4-bit matmul kernels also trigger HIP assertion failures on MI300X (CDNA3 / gfx942), even without Unsloth patching. This commit adds a ROCm-specific inference fallback that: - Skips importing Unsloth at module level (prevents global patching) - Loads models in 16-bit with plain transformers + PEFT instead - Resolves pre-quantized model names (e.g. "xxx-bnb-4bit" -> "xxx") since pre-quantized HF repos still trigger bnb codepaths - Guards get_chat_template calls (unavailable without Unsloth import) - Fixes max_seq_length=0 being passed to from_pretrained (GGUF semantics don't apply to transformers path) The NVIDIA path is completely unchanged -- Unsloth import and for_inference() optimization remain active. GGUF inference (via llama-server/HIP) is unaffected since it never imports Python model classes. AMD GPUs typically have large VRAM (e.g. 192GB on MI300X) so 16-bit loading is practical for inference. Tested on AMD Instinct MI300X VF (ROCm 7.2, HIP 7.1.25424): - Simple generation: PASS - Compare mode (base vs finetuned): PASS - GGUF inference + tool calling: PASS (unaffected by this change) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: guard audio/vision inference on ROCm, remove unused import - Add clear RuntimeError for audio/vision model inference on ROCm (these paths use Unsloth's FastModel/FastVisionModel which would crash on HIP; GGUF inference is the supported path on AMD) - Remove unused `import os as _os` from the ROCm changes * fix: amd-smi parsing for newer output format (gpu_data wrapper, mem_usage, temperature) amd-smi on recent ROCm versions (7.x) wraps metric output in a {"gpu_data": [...]} envelope instead of returning a raw list. This caused get_primary_gpu_utilization() and get_visible_gpu_utilization() to fail silently (returning available=False) because the GPU data dict was never unwrapped. Additionally: - VRAM data moved from "vram" to "mem_usage" with "total_vram" / "used_vram" keys. Added fallback key lookup. - Temperature "edge" sensor returns "N/A" on MI300X VF; the previous dict.get() chain returned the "N/A" string instead of falling through to "hotspot". Changed to a loop that checks each key until a parseable value is found. Tested on AMD Instinct MI300X VF (ROCm 7.2, amd-smi 24.x): - GPU utilization: 0% (idle), up to 100% during training - Temperature: 40-44C (from hotspot sensor) - VRAM: 0.28/191.69 GB (idle) - Power: 158-211W draw * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Bug fix detecting radeon (#4940) * Bug fix detecting radeon * Expanding GPU target for gfx1100* * Generalize gfx family-prefix filter to cover gfx10/gfx12 as well rocminfo on ROCm 6.1+ emits LLVM generic-family ISA lines alongside the specific GPU (e.g. gfx11-generic next to gfx1100). The outer grep captures the bare family prefix from the generic line, and passing that to -DGPU_TARGETS breaks the HIP build because clang only accepts specific gfxNNN ids. The previous filter only special-cased gfx11. Generalize it so any bare 2-digit family prefix (gfx10, gfx11, gfx12, ...) is dropped whenever a specific sibling target is present in the same list. No real AMD GPU has a 2-digit gfx id, so the filter can only ever drop family prefixes and never a real target. Covers the existing gfx11 cases unchanged, and extends the same fix to gfx10-1-generic / gfx10-3-generic (RDNA1/2) and gfx12-generic (RDNA4), which would otherwise hit the same build failure on newer rocminfo. --------- Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com> --------- Co-authored-by: Eda Z <eda.zhou@amd.com> Co-authored-by: GoldenGrapeGentleman <yueyuan@amd.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: billishyahao <bill.he@amd.com> Co-authored-by: Iswarya Alex <47045679+iswaryaalex@users.noreply.github.com> Co-authored-by: Iswarya Alex <iswarya.alex@amd.com> Co-authored-by: Daniel Han <danielhanchen@users.noreply.github.com>	2026-04-10 01:56:12 -07:00
Roland Tannous	33503ea248	Revert "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945 ) This reverts commit `bcf4fd6bd3`.	2026-04-09 23:14:57 -07:00
Roland Tannous	bcf4fd6bd3	updated models template mappers. added lfm2.5vl450m to transformers 5… (#4939 ) * updated models template mappers. added lfm2.5vl450m to transformers 5.3.0 whitelist * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-09 23:36:42 +04:00
Lee Jackson	dc16e0c65b	Studio: keep chat input visible and fix compare pane clipping (#4924 ) * fix(chat): sticky composer bar in thread * fix(chat): fix compare pane clipping * fix(chat): tighten scroll-to-bottom placement and compare footer spacing * Fix TypeScript build break and clean up ViewportFooter classes - Remove unused `compact` prop from ThreadScrollToBottom call site (component is FC with no props, passing it caused TS2322) - Extract shared classes (sticky, bottom-0, z-20, bg-transparent) from ternary branches into the unconditional className string - Restore `relative` on normal-mode footer so the inner absolute bg-background strip has a positioning context - Remove redundant md:pb-3 / md:pb-4 (same value as base pb-3 / pb-4) - Remove no-op `sticky bottom-0` from SharedComposer wrapper in both LoraCompareContent and GeneralCompareContent (flex layout with shrink-0 already pins it at the bottom; parent has no scrollable overflow for sticky to bind to) - Fix truncated comment on pointer-events rationale --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-09 06:00:56 -07:00
dependabot[bot]	5fa8683b27	build(deps): bump the bun-frontend group across 1 directory with 16 updates (#4586 ) * build(deps): bump the bun-frontend group across 1 directory with 16 updates Bumps the bun-frontend group with 16 updates in the /studio/frontend directory: \| Package \| From \| To \| \| --- \| --- \| --- \| \| [@dagrejs/dagre](https://github.com/dagrejs/dagre) \| `2.0.4` \| `3.0.0` \| \| [@dagrejs/graphlib](https://github.com/dagrejs/graphlib) \| `3.0.4` \| `4.0.1` \| \| @hugeicons/core-free-icons \| `3.3.0` \| `4.0.0` \| \| [@streamdown/cjk](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-cjk) \| `1.0.2` \| `1.0.3` \| \| [@streamdown/code](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown-code) \| `1.0.2` \| `1.1.1` \| \| [lucide-react](https://github.com/lucide-icons/lucide/tree/HEAD/packages/lucide-react) \| `0.577.0` \| `1.6.0` \| \| [recharts](https://github.com/recharts/recharts) \| `3.7.0` \| `3.8.0` \| \| [shadcn](https://github.com/shadcn-ui/ui/tree/HEAD/packages/shadcn) \| `3.8.5` \| `4.1.0` \| \| [streamdown](https://github.com/vercel/streamdown/tree/HEAD/packages/streamdown) \| `2.3.0` \| `2.5.0` \| \| [@biomejs/biome](https://github.com/biomejs/biome/tree/HEAD/packages/@biomejs/biome) \| `1.9.4` \| `2.4.8` \| \| [@eslint/js](https://github.com/eslint/eslint/tree/HEAD/packages/js) \| `9.39.4` \| `10.0.1` \| \| [@types/node](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/node) \| `24.12.0` \| `25.5.0` \| \| [eslint](https://github.com/eslint/eslint) \| `9.39.4` \| `10.1.0` \| \| [eslint-plugin-react-refresh](https://github.com/ArnaudBarre/eslint-plugin-react-refresh) \| `0.4.26` \| `0.5.2` \| \| [globals](https://github.com/sindresorhus/globals) \| `16.5.0` \| `17.4.0` \| \| [typescript](https://github.com/microsoft/TypeScript) \| `5.9.3` \| `6.0.2` \| Updates `@dagrejs/dagre` from 2.0.4 to 3.0.0 - [Release notes](https://github.com/dagrejs/dagre/releases) - [Changelog](https://github.com/dagrejs/dagre/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/dagre/compare/v2.0.4...v3.0.0) Updates `@dagrejs/graphlib` from 3.0.4 to 4.0.1 - [Release notes](https://github.com/dagrejs/graphlib/releases) - [Changelog](https://github.com/dagrejs/graphlib/blob/master/changelog.md) - [Commits](https://github.com/dagrejs/graphlib/compare/v3.0.4...v4.0.1) Updates `@hugeicons/core-free-icons` from 3.3.0 to 4.0.0 Updates `@streamdown/cjk` from 1.0.2 to 1.0.3 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-cjk/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/cjk@1.0.3/packages/streamdown-cjk) Updates `@streamdown/code` from 1.0.2 to 1.1.1 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown-code/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/@streamdown/code@1.1.1/packages/streamdown-code) Updates `lucide-react` from 0.577.0 to 1.6.0 - [Release notes](https://github.com/lucide-icons/lucide/releases) - [Commits](https://github.com/lucide-icons/lucide/commits/1.6.0/packages/lucide-react) Updates `recharts` from 3.7.0 to 3.8.0 - [Release notes](https://github.com/recharts/recharts/releases) - [Changelog](https://github.com/recharts/recharts/blob/main/CHANGELOG.md) - [Commits](https://github.com/recharts/recharts/compare/v3.7.0...v3.8.0) Updates `shadcn` from 3.8.5 to 4.1.0 - [Release notes](https://github.com/shadcn-ui/ui/releases) - [Changelog](https://github.com/shadcn-ui/ui/blob/main/packages/shadcn/CHANGELOG.md) - [Commits](https://github.com/shadcn-ui/ui/commits/shadcn@4.1.0/packages/shadcn) Updates `streamdown` from 2.3.0 to 2.5.0 - [Release notes](https://github.com/vercel/streamdown/releases) - [Changelog](https://github.com/vercel/streamdown/blob/main/packages/streamdown/CHANGELOG.md) - [Commits](https://github.com/vercel/streamdown/commits/streamdown@2.5.0/packages/streamdown) Updates `@biomejs/biome` from 1.9.4 to 2.4.8 - [Release notes](https://github.com/biomejs/biome/releases) - [Changelog](https://github.com/biomejs/biome/blob/main/packages/@biomejs/biome/CHANGELOG.md) - [Commits](https://github.com/biomejs/biome/commits/@biomejs/biome@2.4.8/packages/@biomejs/biome) Updates `@eslint/js` from 9.39.4 to 10.0.1 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/commits/v10.0.1/packages/js) Updates `@types/node` from 24.12.0 to 25.5.0 - [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases) - [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/node) Updates `eslint` from 9.39.4 to 10.1.0 - [Release notes](https://github.com/eslint/eslint/releases) - [Commits](https://github.com/eslint/eslint/compare/v9.39.4...v10.1.0) Updates `eslint-plugin-react-refresh` from 0.4.26 to 0.5.2 - [Release notes](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/releases) - [Changelog](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/blob/main/CHANGELOG.md) - [Commits](https://github.com/ArnaudBarre/eslint-plugin-react-refresh/compare/v0.4.26...v0.5.2) Updates `globals` from 16.5.0 to 17.4.0 - [Release notes](https://github.com/sindresorhus/globals/releases) - [Commits](https://github.com/sindresorhus/globals/compare/v16.5.0...v17.4.0) Updates `typescript` from 5.9.3 to 6.0.2 - [Release notes](https://github.com/microsoft/TypeScript/releases) - [Commits](https://github.com/microsoft/TypeScript/compare/v5.9.3...v6.0.2) --- updated-dependencies: - dependency-name: "@dagrejs/dagre" dependency-version: 3.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@dagrejs/graphlib" dependency-version: 4.0.1 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@hugeicons/core-free-icons" dependency-version: 4.0.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@streamdown/cjk" dependency-version: 1.0.3 dependency-type: direct:production update-type: version-update:semver-patch dependency-group: bun-frontend - dependency-name: "@streamdown/code" dependency-version: 1.1.1 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: lucide-react dependency-version: 1.6.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: recharts dependency-version: 3.8.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: shadcn dependency-version: 4.1.0 dependency-type: direct:production update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: streamdown dependency-version: 2.5.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: "@biomejs/biome" dependency-version: 2.4.8 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@eslint/js" dependency-version: 10.0.1 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: "@types/node" dependency-version: 25.5.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint dependency-version: 10.1.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: eslint-plugin-react-refresh dependency-version: 0.5.2 dependency-type: direct:development update-type: version-update:semver-minor dependency-group: bun-frontend - dependency-name: globals dependency-version: 17.4.0 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend - dependency-name: typescript dependency-version: 6.0.2 dependency-type: direct:development update-type: version-update:semver-major dependency-group: bun-frontend ... Signed-off-by: dependabot[bot] <support@github.com> * Revert dagrejs upgrades Keep @dagrejs/dagre at ^2.0.4 and @dagrejs/graphlib at ^3.0.4. * Revert biome, eslint, typescript, and recharts upgrades These upgrades break studio/frontend locally: - @biomejs/biome 2.4.10 fails to parse the existing biome.json (files.ignore and organizeImports keys removed in v2; schema version mismatch). - typescript 6.0.2 emits TS5101 on tsconfig.app.json baseUrl ("Option 'baseUrl' is deprecated and will stop functioning in TypeScript 7.0"), so tsc -b exits 2. - eslint 10.2.0 conflicts with eslint-plugin-react-hooks@7.0.1, which peers on eslint ^9; npm install fails with ERESOLVE. - recharts 3.8.1 widened LegendPayload.dataKey to include a function type, which breaks the React key={item.dataKey} usage in src/components/ui/chart.tsx (TS2322). Hold these at their current pinned versions until the upstream peer deps and config migrations are ready. --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-08 04:34:33 -07:00
Wasim Yousef Said	8e977445d4	Let recipes use the model loaded in Chat (#4840 ) * feat: inject local model provider into recipe jobs via JWT * feat: auto-generate JWT for local model providers in recipes * feat: add is_local flag to model provider config types and utils * fix(studio): skip endpoint validation for local providers * feat(studio): add local/external model source toggle to provider dialog * feat(studio): thread localProviderNames through model config dialog chain * feat(studio): show 'Local model (Chat)' label for local model_provider configs * fix: hardcode loopback for local endpoint, clear stale creds on toggle * fix: document TOCTOU/JWT rotation, add deferred import comments, fix is_local serialization * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix(studio): clear stale local model state on provider toggle and validation * fix(studio): override empty local endpoint in validation and skip model gate for unused providers * fix(studio): resolve loopback port from app.state, clear stale local provider fields, sync model id on toggle Address review feedback on the local-model-provider flow: - Backend (jobs.py): _resolve_local_v1_endpoint now reads the actual bound port from app.state.server_port (set in run.py after binding) instead of parsing it out of request.base_url, which is wrong behind any reverse proxy or non-default port. The two duplicated urlparse blocks are gone. - Backend (jobs.py): defensively pop api_key_env, extra_headers, extra_body from local providers so a previously external provider that flipped to local cannot leak invalid JSON or rogue auth headers into the local /v1 call. Also dedupe the post-loop assignment and tighten the local-name intersection so empty names cannot match. - Backend (jobs.py): hoist datetime and urllib.parse imports to the top import block for consistency with the rest of the file. - Backend (run.py): expose the bound port on app.state.server_port after the uvicorn server is constructed. - Frontend (model-provider-dialog.tsx): clear extra_headers and extra_body when toggling to local mode. Hidden inputs would otherwise keep stale JSON blocking validate/run. - Frontend (model-config-dialog.tsx): factor the local-aware provider selection logic into applyProviderChange and call it from both onValueChange and onBlur, so manually typing a provider name and tabbing away keeps the model field consistent. - Frontend (recipe-studio.ts store): handle both directions of the is_local toggle in the cascade. external -> local now backfills model: "local" on already-linked model_configs so they pass validation immediately, mirroring the existing local -> external clear path. - Frontend (validate.ts + build-payload.ts): thread localProviderNames into validateModelConfigProviders and skip the "model is required" check for local-linked configs. Local providers do not need a real model id since the inference endpoint uses the loaded Chat model. * fix(studio): narrow store cascade types, sync model placeholder on graph relink and node removal, harden ephemeral port path Loop 2 review fixes: - recipe-studio.ts: type-narrow next.is_local by also checking next.kind === "model_provider". TS otherwise raised TS2339 because next was typed as the union NodeConfig after the spread. The behavior is unchanged but the code now compiles cleanly. - model-config-dialog.tsx: convert the lastProviderRef / providerInputRef ref-during-render pattern (pre-existing react-hooks/refs lint error) to a useEffect that syncs providerInputRef from config.provider. The combobox blur path still uses applyProviderChange and remains stable. - recipe-graph-connection.ts: when a graph drag links a model_provider to a model_config, mirror the dialog applyProviderChange behavior: fill model: "local" if the new provider is local and the model field is blank, clear model when relinking from a local placeholder to an external provider, otherwise leave the model alone. - reference-sync.ts: when a referenced provider node is removed, clear the synthetic model: "local" placeholder along with the provider field, so a future relink to an external provider does not pass validation with a stale value that fails at runtime. - run.py: only publish app.state.server_port when the bound port is a real positive integer; for ephemeral binds (port==0) leave it unset and let request handlers fall back to request.base_url. - jobs.py: _resolve_local_v1_endpoint also falls back when app.state.server_port is non-positive, and uses `is None` instead of the truthy fallback so a literal 0 is handled correctly. * fix(studio): strict is_local check, narrow loaded-model gate to LLM-reachable configs, add scope-server port fallback Loop 3 review fixes: - jobs.py, validate.py: require `is_local is True` instead of truthy check. Malformed payloads such as is_local: "false" or is_local: 1 would otherwise be treated as local and silently rewritten to the loopback endpoint. - jobs.py: _resolve_local_v1_endpoint now tries request.scope["server"] (the actual uvicorn-assigned (host, port) tuple) as a second resolution step before falling back to parsing request.base_url. This covers direct-uvicorn startup paths and ephemeral binds that never publish app.state.server_port. - jobs.py: new _used_llm_model_aliases helper collects the set of model_aliases that an LLM column actually references, and the "Chat model loaded" gate is now only triggered when a local provider is reachable from that set. Orphan model_config nodes on the canvas no longer block unrelated recipe runs. * fix(studio): force skip_health_check on local-linked configs, skip JSON parsing for local providers, local-aware inline editor Loop 4 review fixes: - jobs.py: after rewriting local providers, also force skip_health_check: true on any model_config linked to a local provider. The /v1/models endpoint only advertises the real loaded model id, so data_designer's default model-availability health check would otherwise fail against the placeholder "local" id before the first chat completion call. The inference route already ignores the model id in chat completions, so skipping the check is safe. - builders-model.ts: buildModelProvider now short-circuits for local providers and emits only { name, endpoint: "", provider_type, is_local } without running parseJsonObject on the hidden extra_headers/extra_body inputs. Imported or hydrated recipes with stale invalid JSON in those fields no longer block client-side validate/run. - inline-model.tsx: the model_config branch now accepts an optional localProviderNames prop and mirrors the dialog applyProviderChange behavior. Changing provider to/from a local one auto-fills or clears the "local" placeholder consistently with the other edit paths. - recipe-graph-node.tsx: derive localProviderNames from the store via useMemo (stable identity) and pass it through renderNodeBody to <InlineModel>. Hooks order is preserved by declaring them above the early return for markdown_note nodes. - run.py: minor comment tweak - loop 3 already added the scope-server fallback path, note that in the comment. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <info@unsloth.ai>	2026-04-08 03:48:22 -07:00
dependabot[bot]	67e9db4921	build(deps): bump oxc-parser (#4776 ) Bumps the npm-oxc-validator group in /studio/backend/core/data_recipe/oxc-validator with 1 update: [oxc-parser](https://github.com/oxc-project/oxc/tree/HEAD/napi/parser). Updates `oxc-parser` from 0.121.0 to 0.123.0 - [Release notes](https://github.com/oxc-project/oxc/releases) - [Changelog](https://github.com/oxc-project/oxc/blob/main/napi/parser/CHANGELOG.md) - [Commits](https://github.com/oxc-project/oxc/commits/crates_v0.123.0/napi/parser) --- updated-dependencies: - dependency-name: oxc-parser dependency-version: 0.123.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: npm-oxc-validator ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2026-04-08 03:35:33 -07:00

1 2 3 4 5 ...

1155 commits