unsloth

mirror of https://github.com/unslothai/unsloth synced 2026-04-21 13:37:39 +00:00

Author	SHA1	Message	Date
Roland Tannous	21e9a91a57	Studio: forward standard OpenAI tools / tool_choice on /v1/responses (Codex compat) (#5122 ) * Studio: forward standard OpenAI tools / tool_choice on /v1/responses Mirrors the /v1/chat/completions client-side tool pass-through from #5099 so clients (OpenAI Codex CLI, OpenAI Python SDK, ...) that target the Responses API receive structured function_call output items instead of plain text with tool-call tokens leaking into content. - ResponsesRequest: type tools/tool_choice properly, add parallel_tool_calls; accept function_call and function_call_output input items for multi-turn - Translate flat Responses tool / tool_choice shape to the nested Chat Completions shape before forwarding to llama-server - _normalise_responses_input: map function_call_output -> role="tool", function_call -> assistant tool_calls (preserving call_id) - Non-streaming: map returned tool_calls -> top-level function_call output items keyed by call_id - Streaming: emit response.output_item.added (function_call), response.function_call_arguments.delta/.done, and response.output_item.done per tool call while keeping the text message at output_index 0 - Pytest coverage: tools/tool_choice translation, multi-turn input mapping, non-streaming tool_calls mapping, response round-trip * Studio: merge system messages and close inner stream on /v1/responses Fixes two issues surfacing when OpenAI Codex CLI drives /v1/responses against a GGUF with a strict chat template (gpt-oss harmony, Qwen3, ...). 1. "System message must be at the beginning" upstream errors Codex sends `instructions` AND a `role:"developer"` message in `input`, producing two separate system-role messages. Strict templates raise when a second system message exists or when one appears after a user turn. _normalise_responses_input now hoists all instructions / system / developer content into a single merged system message at the top of the Chat Completions message list. 2. "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" _responses_stream consumed the inner chat-completions body_iterator without an explicit aclose() in a finally block. On client disconnect (Codex frequently cancels mid-stream), Python 3.13 finalized the inner async generator on a different task, tripping anyio's cancel-scope check. Mirrored the same try/finally + aclose pattern used by the /v1/messages, /v1/chat/completions, and /v1/completions passthroughs. Tests: hoisting of instructions + developer, developer mid-conversation, multiple system messages in input, no-system passthrough. * Studio: accept Codex multi-turn shapes and fix cross-task stream close on /v1/responses Two issues observed driving /v1/responses from OpenAI Codex CLI against a GGUF backend. 1. 422 on every turn after the first Codex replays prior assistant turns with `content:[{"type":"output_text","text":...,"annotations":[],"logprobs":[]}]` and carries forward `reasoning` items (o-series / gpt-5) between turns. Our `ResponsesContentPart` union only accepted input_text / input_image, and `ResponsesInputItem` only message / function_call / function_call_output, so Pydantic failed the whole list and FastAPI returned `"Input should be a valid string"` against the `str` branch of the outer union. - Add `ResponsesOutputTextPart` for assistant-replay content. - Add `ResponsesUnknownContentPart` and `ResponsesUnknownInputItem` as permissive catch-alls (drop during normalisation). - Wire an explicit `Discriminator` so dispatch is deterministic and the fallthrough reaches the catch-all instead of misreporting via the outer `Union[str, list[...]]`. - `_normalise_responses_input` now accepts output_text parts, flattens single-part assistant text to a plain string (keeps legacy chat templates happy), and silently drops reasoning / unknown items. 2. "async generator ignored GeneratorExit" / cross-task cancel scope `_responses_stream` awaited `openai_chat_completions` in the parent route-handler task, which opens the httpx client for the inner passthrough on that task. The outer `StreamingResponse` then iterates in a child task, so the asyncgen GC finalises the inner httpcore byte stream on the child task, tripping anyio's "Attempted to exit cancel scope in a different task". Move the `await` inside `event_generator` so the httpx lifecycle stays within the single streaming child task, and surface any HTTPException as a `response.failed` SSE frame. Tests: assistant output_text replay, reasoning-item tolerance, unknown content-part tolerance, end-to-end Codex-shape payload (developer + user + reasoning + function_call + function_call_output + assistant output_text + user), and single-part assistant flattening to plain string. * Studio: call llama-server directly from streaming /v1/responses The previous fix (running the inner await inside event_generator) was not enough. Wrapping the existing `openai_chat_completions` pass-through still stacks two async generators: when the outer generator is closed, the innermost `HTTP11ConnectionByteStream.__aiter__` in httpcore doesn't receive GeneratorExit before Python's asyncgen GC finalises it in a sibling task, tripping "Attempted to exit cancel scope in a different task" and "async generator ignored GeneratorExit" — the same Python 3.13 + httpcore 1.0.x interaction already seen in PRs #4956, #4981, #5099. Cure both pass-throughs had: a single same-task httpx lifecycle with explicit `aiter_lines().aclose()` BEFORE `resp.aclose()` / `client.aclose()` in the generator's finally block. Apply it at the Responses layer by dropping the wrapper entirely for GGUF: open httpx, consume `resp.aiter_lines()`, parse `chat.completion.chunk`, emit Responses SSE events, close everything in finally — all in the single StreamingResponse child task. Non-GGUF streaming is rejected with a 400 (wrapping the transformers backend would re-introduce the double-layer pattern and isn't a Codex-compatible path today anyway). Also surfaces upstream httpx.RequestError / non-200 as a `response.failed` SSE frame rather than a dropped stream now that the request is dispatched after SSE headers have gone out. * Studio: silence benign httpcore asyncgen GC warnings on Python 3.13 The streaming pass-throughs (/v1/chat/completions, /v1/messages, /v1/responses, /v1/completions) all use the proven #4981 / #5099 pattern — single-task httpx lifecycle with explicit aiter_lines().aclose() ahead of resp.aclose() / client.aclose() in the generator's finally block. That handles our own iterators correctly. The residual noise ("async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task") comes from an innermost HTTP11ConnectionByteStream.__aiter__ that httpcore creates internally inside its pool. We hold no reference to it, so we cannot aclose it ourselves. Python 3.13's asyncgen GC hook finalises it on the finaliser task, its aclose path enters an anyio CancelScope shield, and Python flags the cross-task exit. The response has already been delivered with a 200 by then — it is purely log noise, not a functional failure. Same interaction seen in modelcontextprotocol/python-sdk #831, agno #3556, chainlit #2361, langchain-mcp-adapters #254. Install a targeted sys.unraisablehook that swallows this specific tuple — RuntimeError mentioning "cancel scope" or "GeneratorExit" plus an object repr referencing HTTP11ConnectionByteStream — and defers to the default hook for every other unraisable. Idempotent; guarded by a sentinel attribute so repeated imports don't stack filters.	2026-04-21 13:17:20 +04:00
Lee Jackson	c20959dbf4	Studio: Improve chat composition, fix scroll behaviour, and refine sidebar UX (#5089 ) * Chatbox, scroll, and menu fixes - Fixed chatbox auto-expand height for multi-line text on the compare page - Fixed chatbox UI to be consistent across compare and new chat - Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button - Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly - Added shutdown studio button to the menu for easier access - Fixed pop-up menu width to match the user button width (cherry picked from commit cd4e390dfa84fe311fae79a781b96cc0ef5970a9) * fix: correct compare scroll viewport and clean up chat composer UI polish * Dark theme refactor and sidebar/chat UI refinements - Complete refactoring of dark theme - Replaced square rounded-corner user profile image with a circular bordered one - Replaced user profile icon with 'U' initial and renamed label from 'Studio' to 'User' - Chat bubbles now have a pointy top-right edge - Sidebar menu tab line color selection is now consistent across all menus - Tab-selection color animation now also applies to recent chats - Removed 'Compare' menu autoselect when a compare chat conversation is selected - Fixed UI consistency in Compare to match New Chat - Removed sidebar animation and tab line, replaced with rounded selection for consistency - Further adjustments to sidebar UI - Further adjustments to compare chat UI * Fixed sidebar collapse/expand for recent chats and recent runs not being clickable * Chatbox, scroll, and menu fixes - Fixed chatbox auto-expand height for multi-line text on the compare page - Fixed chatbox UI to be consistent across compare and new chat - Fixed scrolling being enabled on pages with no content, which also triggered the scroll-to-bottom button - Fixed scroll-to-bottom button to only appear after scrolling up a reasonable amount instead of instantly - Added shutdown studio button to the menu for easier access - Fixed pop-up menu width to match the user button width * Sidebar, fonts, and chat UI refinements - Replaced logo PNG with real font text for 'unsloth' and 'BETA' label - Added Hellix font and applied it across menus and UI elements - Lighter scrollbar in the sidebar compared to other areas of the app - Adjusted chat font and chat bubble styling - Adjusted app menu design to stay consistent with the sidebar - Adjusted text style for 'New Chat' and repositioned content/chatbox - Adjusted model selector and top area UI - Fixed footer text from 'LLM's' to 'LLMs' - Fixed active selection border color incorrectly appearing on page refresh and during general navigation - Logo now defaults to 'New Chat' when clicked * Sidebar, model selector, and mobile UI fixes - Further adjustments to sidebar UI and logo - Changed right bar icon - Model selector adjustments - Collapsed sidebar now matches the content area background - Adjusted Hellix font spacing across pages - Fixed sidebar icon overlap on mobile screens * Adjust sidebar icons * Adjust sidebar icons * Fixed compare chat UI and scrolling issues * Fixed inference settings icon behavior and context info positioning - Fixed top right inference settings icon to move into sidepanel during expand/collapse, matching left sidebar behavior - Adjusted context information element positioning * Fix: textarea overflow in system prompt editor * Code block redesign, font, and chat bubble adjustments - Redesigned code block colors and theme - Changed code block font to Fira Code - Fixed scrollbar disappearing when expanding/collapsing tool calls in chats - Adjusted chat bubble background color * Fix chat bubble background color in dark theme * fix: restore textarea auto-sizing and scope prompt editor sizing * fix: add explicit textarea field sizing for prompt editor overflow * fix: generate chat nonce on click instead of render * fix: respect training lock on logo navigation * Refactor compare page dual chat scrolling behavior * Revert "Refactor compare page dual chat scrolling behavior" This reverts commit `d056ec09f2`. --------- Co-authored-by: sneakr <hauzin@hotmail.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-21 02:20:45 +04:00
Konstantin Azizov	0a5c61ffcc	fix: prefer mainstream clipboard copy over deprecated one (#5109 ) Fixes #5097 Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-20 23:18:18 +04:00
Lee Jackson	d3215ce113	Studio: Show LoRA live logs and update GGUF quant options (#5058 ) * export: update GGUF quant list and ordering * gguf: add Q2_K_L quantize flags for output and embeddings * export: add live console logs for LoRA export flow * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: stream q2_k_l quantize logs and include subprocess error details * fix: route Q2_K_L preset to q2_k ftype with q8_0 output+embeddings --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-20 23:14:49 +04:00
Lee Jackson	9c8a079d97	Studio: Local profile customization in settings and sync sidebar identity (#5088 ) * studio: add local profile customization in settings * studio: add local profile settings and sync sidebar identity * fix: adjust profile card margin * fix: move helper modules to utils and use single-letter avatar fallback * fix: keep profile icon visible on sidebar collapse * fix: sidebar account trigger labeling and profile reset prefs	2026-04-20 22:28:02 +04:00
Roland Tannous	9954781d30	fix(studio/chat): cancel in-flight run when trashing a thread from sidebar (#5067 ) Trashing a thread mid-stream used to delete the Dexie rows while the model kept generating, because the sidebar has no access to the @assistant-ui aui context. Expose per-thread cancelRun() through the chat runtime store and call it from deleteChatItem so trash behaves like Stop → Trash. Covers compare pairs by cancelling each paired thread. Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-20 21:06:59 +04:00
Michael Han	b24f3f61b8	Update README.md	2026-04-20 00:37:40 -07:00
Michael Han	f5eec8a6f2	Qwen3.6 and ReadMe revamp.md	2026-04-19 23:16:36 -07:00
Roland Tannous	ac2daf8b7a	Studio: forward standard OpenAI tools / tool_choice to llama-server (#5099 ) * fix(studio): forward OpenAI tools/tool_choice to llama-server (#4999) Studio's /v1/chat/completions silently stripped standard OpenAI `tools` and `tool_choice` fields, so clients using standard function calling (opencode, Claude Code, Cursor, Continue, ...) never got structured tool_calls back. Adds a client-side pass-through path mirroring the existing Anthropic /v1/messages flow: when `tools` is present without Studio's `enable_tools` shorthand, the request is forwarded to llama-server verbatim so the client sees native id, finish_reason ("tool_calls"), delta.tool_calls, and accurate usage tokens. Also wires Anthropic tool_choice forwarding: /v1/messages previously accepted tool_choice on the request model but silently dropped it with a warning. Translate the four Anthropic shapes to OpenAI format and forward them so agentic clients can actually enforce tool use. - ChatCompletionRequest: add tools, tool_choice, stop; extra="allow" - ChatMessage: accept role="tool", optional tool_call_id / tool_calls / name; content is now optional (assistant with only tool_calls) - routes/inference.py: _openai_passthrough_stream / _openai_passthrough_non_streaming helpers, routing branch in openai_chat_completions, vision+tools via content-parts injection - _build_passthrough_payload: tool_choice parameter (default "auto") - anthropic_compat: anthropic_tool_choice_to_openai() translator - tests/test_openai_tool_passthrough.py: Pydantic + translator unit tests - tests/test_studio_api.py: 5 new E2E tests (non-stream, stream, multi-turn, OpenAI SDK, Anthropic tool_choice=any regression) * fix(studio): surface httpx transport errors from OpenAI passthrough When the managed llama-server subprocess crashes mid-request, the async pass-through helpers in routes/inference.py used to return a bare 500 (non-streaming) or an "An internal error occurred" SSE chunk (streaming) because _friendly_error only recognized the sync path's "Lost connection to llama-server" substring -- httpx transport failures (ConnectError / ReadError / RemoteProtocolError / ReadTimeout) stringify differently and fell through to the generic case. - _friendly_error: map any httpx.RequestError subclass to the same "Lost connection to the model server" message the sync chat path emits. Placed before the substring heuristics so the streaming path automatically picks it up via its existing except Exception catch. - _openai_passthrough_non_streaming: wrap the httpx.AsyncClient.post in a try/except httpx.RequestError and re-raise as HTTPException 502 with the friendly detail. - tests/test_openai_tool_passthrough.py: new TestFriendlyErrorHttpx class pinning the mapping for ConnectError, ReadError, RemoteProtocolError, ReadTimeout, and confirming non-httpx paths (context-size heuristic, generic fallback) are unchanged. * fix(studio): close aiter_bytes/aiter_lines explicitly in passthroughs The httpcore asyncgen cleanup fix in `5cedd9a5` is incomplete on Python 3.13 + httpcore 1.0.x: it switched to manual client/response lifecycle but still used anonymous `async for raw_line in resp.aiter_lines():` patterns in all three streaming paths. Python's async for does NOT auto-close the iterator on break/return, so the aiter_lines / aiter_bytes async generator remains alive, reachable only from the surrounding coroutine frame. Once `_stream()` returns the frame is GC'd and the orphaned asyncgen is finalized on a LATER GC pass in a DIFFERENT asyncio task, where httpcore's HTTP11ConnectionByteStream.aclose() enters anyio.CancelScope.__exit__ with a mismatched task and prints "Exception ignored in: <async generator>" / "async generator ignored GeneratorExit" / "Attempted to exit cancel scope in a different task" to the server log. User observed this on /v1/messages after successful (status 200) requests, with the traceback pointing at HTTP11ConnectionByteStream .__aiter__ / .aclose inside httpcore. Fix: save resp.aiter_lines() / resp.aiter_bytes() as a variable and explicitly `await iter.aclose()` in the finally block BEFORE resp.aclose() / client.aclose(). This closes the asyncgen inside the current task's event loop, so the internal httpcore byte stream is cleaned up before Python's asyncgen GC hook has anything orphaned to finalize. Each aclose is wrapped in try/except Exception so nested anyio cleanup noise can't bubble out. Applied to all three streaming passthrough paths: - _anthropic_passthrough_stream (/v1/messages client-side tool path) - _openai_passthrough_stream (/v1/chat/completions client-side tool path, new in this PR) - openai_completions (/v1/completions bytes proxy from PR #4956) * fix(studio): default ChatCompletionRequest.stream to false per OpenAI spec OpenAI's /v1/chat/completions spec defaults `stream` to false, so clients that omit the field (naive curl, minimal integrations) expect a single JSON response back. Studio was defaulting to true, silently switching those clients into SSE and breaking any parser that didn't also handle streaming. ResponsesRequest and AnthropicMessagesRequest already default to false correctly; only ChatCompletionRequest was wrong. Studio's own frontend always sets `stream` explicitly on every chat-adapter / chat-api / runtime-provider call site, so the flip has no UI impact. SDK users (OpenAI Python/JS SDK, opencode, Claude Code, Cursor, Continue) also always pass `stream` explicitly, so they're unaffected. The only clients feeling the change are raw-curl users who were relying on the wrong default -- those get the correct OpenAI behavior now. Added a regression test pinning the default so it can't silently flip back. * fix(studio): reject images in OpenAI tool passthrough for text-only GGUFs The new tool passthrough branch runs before _extract_content_parts, skipping the existing not is_vision guard. Requests combining tools with an image on a text-only tool-capable GGUF were forwarded to llama-server, producing opaque upstream errors instead of the pre-existing clear 400. Restore the guard inline at the dispatch point, checking both legacy image_base64 and inline image_url parts. * fix(studio): require tool_call_id on role=tool chat messages Enforce the OpenAI spec rule that role="tool" messages must carry a tool_call_id. Without it, upstream backends cannot associate a tool result with the assistant's prior tool_calls entry and the request fails in non-obvious ways through the passthrough path. Reject at the request boundary with a 422 instead. * fix(studio): harden OpenAI tool passthrough validation and error surfacing Three related fixes called out by the PR review: 1. Preserve upstream status codes in the streaming passthrough. The httpx request is now dispatched before the StreamingResponse is constructed. Non-200 upstream responses and httpx RequestError transport failures raise HTTPException with the real status instead of being buried inside a 200 SSE error frame, so OpenAI SDK clients see APIError/BadRequestError/... as expected. 2. Require non-empty content on user/system/tool messages. Per the OpenAI spec, content may only be omitted on assistant messages that carry tool_calls; enforce that at the request boundary so malformed messages never reach the passthrough path. 3. Role-constrain tool-call metadata. tool_calls is only valid on role=assistant, tool_call_id and name only on role=tool. Without this, a user/system message with tool_calls would flip the passthrough branch on and be forwarded to llama-server, surfacing as an opaque upstream error. * fix(studio): normalize image mode and passthrough JSON verbatim Two Gemini-code-assist review findings on PR #5099: 1. Unconditionally convert decoded images to RGB before PNG encoding. The prior code only handled RGBA, letting CMYK/I/F images crash at img.save(format="PNG") and surface as opaque 400s. Applied to both the passthrough helper and the non-passthrough GGUF path that originally carried this pattern, keeping the two sites in sync. 2. Return the upstream JSON body as raw bytes via Response rather than parse-then-re-serialize with JSONResponse. Matches the passthrough helper's "verbatim" contract and drops a redundant round-trip. --------- Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-18 12:53:23 +04:00
Manan Shah	7d0d2f256c	Add qwen3.6 script (#5084 ) * unsloth gemma4 support files * some fixes * Fixing cache.empty() calls (#4813) * Fixing cache.empty() calls * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * Fix/gemma4 mlx (#4816) * Fixing cache.empty() calls * fixing for mlx versions * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * removed bidirectional check for 31b (#4839) Co-authored-by: Manan17 <shahmanan170602@gmail.coml> * Add Gemma 4 26B MoE support (MLX) (#4844) * removed bidirectional check for 31b * Change gemma4_text for moe * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> * fix(gemma4): cast RoPE offset to int before mx.arange() (#4901) * fix(gemma4): cast RoPE offset to int before mx.arange() * fix(gemma4): use zero-based arange + offset to avoid CPU-GPU sync * qwen3.6 patches for multi-turn chat * qwen3.6 script * removing unnecessary scripts * displaying errors for not installed packages --------- Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com> Co-authored-by: Manan Shah <mananshah@Manans-MacBook-Pro.local> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Manan17 <shahmanan170602@gmail.coml> Co-authored-by: Théophile Lafargue <138336683+eauchs@users.noreply.github.com>	2026-04-17 01:21:30 -07:00
Daniel Han	d20b306755	Versioning	2026-04-16 12:06:10 -07:00
Daniel Han	0b57884120	Add Qwen3.6 inference defaults for Studio (#5065 ) * Add Qwen3.6 inference defaults for Studio Add qwen3.6 family entry to inference_defaults.json with the recommended sampling parameters from Qwen's documentation: temperature=0.7, top_p=0.8, top_k=20, min_p=0.0, presence_penalty=1.5, repetition_penalty=1.0. Without this, Qwen3.6 models fall through to the generic qwen3 pattern which uses different defaults (temperature=0.6, top_p=0.95, no presence_penalty). * Add Qwen3.6-35B-A3B-GGUF to default model lists * Add Qwen3.5/3.6 presence_penalty to thinking toggle and small-model disable logic - Thinking toggle (on-load + button click) now sets presencePenalty: 1.5 for Qwen3.5 and Qwen3.6 models (both thinking-ON and thinking-OFF states) - Small-model thinking-disable check (<9B defaults to no-thinking) extended from Qwen3.5-only to also cover Qwen3.6, in all 3 locations: frontend on-load, frontend refresh, backend llama_cpp.py	2026-04-16 11:42:42 -07:00
Daniel Han	d56f980452	fix: multi-GPU inference crash for bnb 4-bit/8-bit models (#5068 ) * fix: multi-GPU inference crash for bnb 4-bit/8-bit models When load_in_4bit or load_in_8bit is used with device_map="sequential" and max_memory constraints that place weights across multiple GPUs (or entirely on a non-default GPU like cuda:1), the bitsandbytes loading path in transformers never calls dispatch_model. No AlignDevicesHook is installed, and the first forward/generate call crashes with: RuntimeError: Expected all tensors to be on the same device This adds _attach_bnb_multidevice_hooks() which is called after from_pretrained returns. It infers a device map from actual parameter placements and calls dispatch_model(force_hooks=True) to install the missing hooks. The function is a complete no-op for the common single-GPU cuda:0 case. Call sites: FastBaseModel.from_pretrained (vision.py) and FastLlamaModel.from_pretrained (llama.py). * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: align with PR #5053 final review improvements - Add hook call to the bnb quantized loading branch in llama.py (the primary load_in_4bit path), not just the non-fast-inference fallback - Expand bnb detection: also check model.is_loaded_in_4bit, model.is_loaded_in_8bit, model.quantization_method - Pass explicit main_device and skip_keys to dispatch_model - Use logger.info instead of print for the success message - Use kwargs.get("load_in_8bit", False) at llama.py call sites * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 11:35:02 -07:00
Lee Jackson	ee86530e55	chore: switch helper and no-cache fallback to Gemma (#5066 )	2026-04-16 22:27:30 +04:00
Wasim Yousef Said	bc9ddb3af6	Fix onboarding followups (#5064 ) * Fix onboarding followups * Rename sidebar studio to train	2026-04-16 10:11:35 -07:00
Wasim Yousef Said	7ef65bd2e5	Chat first onboarding (#5063 ) * auth: default to chat * settings: relaunch onboarding * onboarding: return to launch page * studio: stop auto guided tour * ui: soften global radius * cleanup: rename onboarding exit prop * fix onboarding redirect safety * Show real Unsloth version in settings * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 09:58:10 -07:00
हिमांशु	f4422b0a62	change torchcodec version to 0.10.0 in extra-no-deps (#5043 ) Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-16 19:50:57 +04:00
Wasim Yousef Said	b01e9af124	feat(studio): replace navbar with collapsible sidebar (#4936 ) * feat(studio): replace navbar navigation with collapsible sidebar Add an app-wide sidebar with hover-expand and pin-to-dock behavior. Navigation items (Studio, Recipes, Export, Chat) move from the center pill navbar to the sidebar. Chat threads and recipes render as collapsible sub-lists. Navbar simplified to logo + update + close. - Extend SidebarProvider with pinned/hovered state model - New AppSidebar with animated active indicator, sloth profile menu, theme toggle, guided tour, back/forward navigation - Chat page refactored to URL-driven view state via search params - Extract reusable hooks for chat thread and recipe sidebar data - Guard startViewTransition for browser compatibility - Wrap chat deletions in Dexie transaction for data integrity * feat(studio): move logo to sidebar and make navbar overlay - Sidebar is now full-height with logo in SidebarHeader - Collapsed sidebar shows sticker.png, expanded shows full logo - Navbar is absolute-positioned overlay (no layout space) - Main content extends to top, aligning with navbar controls * feat(studio): full-height sidebar with recents, edge-to-edge nav buttons - Sidebar outside max-w-7xl, pinned to left edge - Remove sidebar rounding, menu buttons rounded-md - Nav buttons flush to sidebar edges with no left rounding - Replace collapsible recipes/chat with flat nav items - Add Recents section with chat history (1 item when not on chat, full on chat) - New Chat as first nav item with PencilEdit02Icon - Cursor pointer on all sidebar buttons - Navbar temporarily hidden for screenshots * fix(studio): fix chat scroll, action bar hover, collapsible recents - Fix sticky composer by removing `relative` override on viewport footer - Action bar buttons only show on hover (autohide=always) - Remove floating border/shadow from action bar - Add scroll space above composer for last message actions - Back/forward buttons use router history (stay in-app) - Recents section collapsible with chevron on chat route - Set html/body/#root height for proper h-full chain * fix(studio): address review feedback, clean up unused code - Unhide navbar (was left hidden from screenshot) - Remove unused imports: SidebarMenuSub, BubbleChatIcon, ColumnInsertIcon - Remove unused vars: recipeItems, activeRecipeId, canCompare, recipesOpen - Include compare query id in active sidebar selection - Use store type for contextUsage instead of inline type - Simplify noop in sidebar.tsx - Remove empty className prop feat(studio): add mobile sidebar, recent runs section, and misc UX fixes * feat(studio): scaffold settings feature module with dialog store * feat(studio): add tri-state theme store for settings * feat(chat): add clear-all-chats and export-chat-history utils * feat(studio): add settings dialog shell with tab rail * feat(studio): add appearance tab with theme and sidebar pin * feat(studio): add settings general tab with hf token, auto-title, reset prefs * feat(studio): add settings chat tab with export and clear * feat(studio): add api keys tab with list and revoke flow * feat(studio): add create-key form and reveal dialog * feat(studio): add usage examples panel to api keys tab * feat(studio): add settings about tab with update and shutdown * feat(studio): add settings dropdown item and cmd-comma shortcut * feat(studio): remove legacy api-keys route and chat-sheet preference rows * fix(studio): settings dialog a11y + polish pass * feat(studio): inline api key reveal card replacing nested dialog * fix(studio): hide revoked keys from settings list * refactor(studio): strip navbar and hoist training unload guard * feat(studio): explicit sidebar toggle, remove hover-open and pin icons * fix(studio): use SidebarRight01Icon for collapsed sidebar open toggle * fix(studio): address code review findings for settings dialog * feat(studio): collapsible navigate group with standalone new-chat and compare * fix(studio): chat-only standalone actions, use ColumnInsertIcon for compare * fix(studio): sidebar new-chat/compare state reset and icon-mode collapsible * feat(studio): add compact logo assets for sidebar header * Fixed sidebar design * fix(studio): sidebar delete icon hover contrast and sizing * feat(studio): route-gate sidebar recents (chats off /studio, runs on /studio) * feat(studio): add chat search store * feat(studio): add chat search index hook with snapshot-on-open * feat(studio): add chat search command dialog with global shortcut * feat(studio): wire chat search into sidebar * fix(studio): trim hf token on save, add show/hide toggle, commit on close * revert(studio): restore original sidebar/border colors, brighten sidebar * feat(studio): forward overlayClassName through CommandDialog * fix(studio): wrap search dialog in Command context, redesign as flat 635px card * fix(studio): reserve right padding on recent items so delete icon stops overlapping title * fix(studio): skip hf token unmount-commit during reset-prefs reload * chore(studio): drop unused icon import and unreachable runs navigate branch * fix(studio): chat search index filters archived before limit, batches message query, picks up reasoning text * fix(studio): keep CommandEmpty in tree so empty state renders correctly * fix(studio): cap system prompt and chat template textareas so they scroll instead of growing * fix(studio): attach chat-compare tour anchor to sidebar compare button * fix(studio): persist system theme explicitly so next-themes does not clobber on reload * fix(studio): auto-switch to history tab when selecting a recent run from sidebar * UI overhaul: chatbox, scrollbar, sidebar, and compare view UI Changes: - Redesigned the Compare UI with general cleanup - Redesigned the Chatbox UI - Reduced the width of the user chat bubble for improved readability - Narrowed the user chat box across the content page - Adjusted thinking-box text color to be slightly darker - Removed faded text effect from chat messages - Removed faded text effect from the thinking box - Added a small LLM chat safety note at the bottom of the chatbox - Restyled the scrollbar Layout & Behavior: - Reworked the scrollbar to span the full height of the page (no top/bottom padding) and remain persistently visible when content is scrollable, rather than only on hover - Reworked the Configuration sidebar to span full height — removed rounded corners and borders, with the scrollbar adjusted to match the full top-to-bottom layout - Adjusted the top menu and bottom chatbox content areas to work correctly with the new full-page scroll behavior - Made chat content match the chatbox width, with content sliding slightly behind the chatbox when scrolling - Aligned chat text width with the chatbox for visual consistency, including how far the text extends behind the chatbox Fixes: - Fixed the chatbox not auto-expanding when typing multi-line input while bottom-positioned during an active chat (previously only worked before a chat had started) - Fixed positioning and design of the user chat hover menu buttons to match the assistant chat box — now displayed below the chat bubble instead of on the left side * Fix user message layout in thread component * swap code icon * fix compare layout * fix compare pane flex * Sidebar improvements and fixes - Added scrolling support to the sidebar so menus and recent chats no longer get hidden - Recent chats are now always visible in the sidebar, not hidden when in Studio, Recipes, or Export - Recent chat is now deselected when selecting other navigations - Fixed sidebar glitch where browser resize could make the sidebar and expand button disappear completely - Fixed glitch where the open-sidebar hover tooltip appeared above the logo when clicking expand sidebar - Reduced sidebar width on mobile to around 2/3 of the screen (was too wide) - Made the close-sidebar hover tooltip consistent with the rest of the design - Removed sidebar collapse/expand animation - Small adjustment to chat width * Fix route scrolling, polling, and theme sync issues * Fix Studio page scrolling --------- Co-authored-by: sneakr <hauzin@hotmail.com>	2026-04-16 08:46:16 -07:00
Daniel Han	05ec0f110b	Studio: Ollama support, recommended folders, Custom Folders UX polish (#5050 ) * Studio: Ollama support, recommended folders, Custom Folders UX polish Backend: - Add _scan_ollama_dir that reads manifests/registry.ollama.ai/library/* and creates .gguf symlinks under <ollama_dir>/.studio_links/ pointing at the content-addressable blobs, so detect_gguf_model and llama-server -m work unchanged for Ollama models - Filter entries under .studio_links from the generic models/hf/lmstudio scanners to avoid duplicate rows and leaked internal paths in the UI - New GET /api/models/recommended-folders endpoint returning LM Studio and Ollama model directories that currently exist on the machine (OLLAMA_MODELS env var + standard paths, ~/.lmstudio/models, legacy LM Studio cache), used by the Custom Folders quick-add chips - detect_gguf_model now uses os.path.abspath instead of Path.resolve so the readable symlink name is preserved as display_name (e.g. qwen2.5-0.5b-Q4_K_M.gguf instead of sha256-abc...) - llama-server failure with a path under .studio_links or .cache/ollama surfaces a friendlier message ("Some Ollama models do not work with llama.cpp. Try a different model, or use this model directly through Ollama instead.") instead of the generic validation error Frontend: - ListLabel supports an optional leading icon and collapse toggle; used for Downloaded (download icon), Custom Folders (folder icon), and Recommended (star icon) - Custom Folders header gets folder icon on the left, and +, search, and chevron buttons on the right; chevron uses ml-auto so it aligns with the Downloaded and Recommended chevrons - New recommended folder chips render below the registered scan folders when there are unregistered well-known paths; one click adds them as a scan folder - Custom folder rows that are direct .gguf files (Ollama symlinks) load immediately via onSelect instead of opening the GGUF variant expander (which is for repos containing multiple quants, not single files) - When loading a direct .gguf file path, send max_seq_length = 0 so the backend uses the model's native context instead of the 4096 chat default (qwen2.5:0.5b now loads at 32768 instead of 4096) - New listRecommendedFolders() helper on the chat API * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: log silent exceptions and support read-only Ollama dirs Replace silent except blocks in _scan_ollama_dir and the recommended-folders endpoint with narrower exception types plus debug or warning logs, so failures are diagnosable without hiding signal. Add _ollama_links_dir helper that falls back to a per-ollama-dir hashed namespace under Studio's own cache (~/.unsloth/studio/cache/ollama_links) when the Ollama models directory is read-only. Common for system installs at /usr/share/ollama/.ollama/models and /var/lib/ollama/.ollama/models where the Studio process has read but not write access. Previously the scanner returned an empty list in that case and Ollama models would silently not appear. The fallback preserves the .gguf suffix on symlink names so detect_gguf_model keeps recognising them. The prior "raw sha256 blob path" fallback would have missed the suffix check and failed to load. * Address review: detect mmproj next to symlink target for vision GGUFs Codex P1 on model_config.py:1012: when detect_gguf_model returns the symlink path (to preserve readable display names), detect_mmproj_file searched the symlink's parent directory instead of the target's. For vision GGUFs surfaced via Ollama's .studio_links/ -- where the weight file is symlinked but any mmproj sidecar lives next to the real blob -- mmproj was no longer detected, so the model was misclassified as text-only and llama-server would start without --mmproj. detect_mmproj_file now adds the resolved target's parent to the scan order when path is a symlink. Direct (non-symlink) .gguf paths are unchanged, so LM Studio and HF cache layouts keep working exactly as before. Verified with a fake layout reproducing the bug plus a regression check on a non-symlink LM Studio model. * Address review: support all Ollama namespaces and vision projector layers - Iterate over all directories under registry.ollama.ai/ instead of hardcoding the "library" namespace. Custom namespaces like "mradermacher/llama3" now get scanned and include the namespace prefix in display names, model IDs, and symlink names to avoid collisions. - Create companion -mmproj.gguf symlinks for Ollama vision models that have an "application/vnd.ollama.image.projector" layer, so detect_mmproj_file can find the projector alongside the model. - Extract symlink creation into _make_symlink helper to reduce duplication between model and projector paths. * Address review: move imports to top level and add scan limit - Move hashlib and json imports to the top of the file (PEP 8). - Remove inline `import json as _json` and `import hashlib` from function bodies, use the top-level imports directly. - Add `limit` parameter to `_scan_ollama_dir()` with early exit when the threshold is reached. - Pass `_MAX_MODELS_PER_FOLDER` into the scanner so it stops traversing once enough models are found. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: Windows fallback, all registry hosts, collision safety _make_link (formerly _make_symlink): - Falls back to os.link() hardlink when symlink_to() fails (Windows without Developer Mode), then to shutil.copy2 as last resort - Uses atomic os.replace via tmp file to avoid race window where the .gguf path is missing during rescan Scanner now handles all Ollama registry layouts: - Uses rglob over manifests/ instead of hardcoding registry.ollama.ai - Discovers hf.co/org/repo:tag and any other host, not just library/ - Filenames include a stable sha1 hash of the manifest path to prevent collisions between models that normalize to the same stem Per-model subdirectories under .studio_links/: - Each model's links live in their own hash-keyed subdirectory - detect_mmproj_file only sees the projector for that specific model, not siblings from other Ollama models Friendly Ollama error detection: - Now also matches ollama_links/ (the read-only fallback cache path) and model_identifier starting with "ollama/" Recommended folders: - Added os.access(R_OK \| X_OK) check so unreadable system directories like /var/lib/ollama/.ollama/models are not advertised as chips * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: filter ollama_links from generic scanners The generic scanners (models_dir, hf_cache, lmstudio) already filter out .studio_links to avoid duplicate Ollama entries, but missed the ollama_links fallback cache directory used for read-only Ollama installs. Add it to the filter. * Address review: idempotent link creation and path-component filter _make_link: - Skip recreation when a valid link/copy already exists (samefile or matching size check). Prevents blocking the model-list API with multi-GB copies on repeated scans. - Use uuid4 instead of os.getpid() for tmp file names to avoid race conditions from concurrent scans. - Log cleanup errors instead of silently swallowing them. Path filter: - Use os.sep-bounded checks instead of bare substring match to avoid false positives on paths like "my.studio_links.backup/model.gguf". * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address review: drop copy fallback, targeted glob, robust path filter _make_link: - Drop shutil.copy2 fallback -- copying multi-GB GGUFs inside a sync API request would block the backend. Log a warning and skip the model when both symlink and hardlink fail. Scanner: - Replace rglob("") with targeted glob patterns (// and ///) to avoid traversing unrelated subdirectories in large custom folders. Path filter: - Use Path.parts membership check instead of os.sep substring matching for robustness across platforms. Scan limit: - Skip _scan_ollama_dir when _generic already fills the per-folder cap. * Address review: sha256, top-level uuid import, Path.absolute() - Switch hashlib.sha1 to hashlib.sha256 for path hashing consistency. - Move uuid import to the top of the file instead of inside _make_link. - Replace os.path.abspath with Path.absolute() in detect_gguf_model to match the pathlib style used throughout the codebase. * Address review: fix stale comments (sha1, rglob, copy fallback) Update three docstrings/comments that still referenced the old implementation after recent changes: - sha1 comment now says "not a security boundary" (no hash name) - "rglob" -> "targeted glob patterns" - "file copies as a last resort" -> removed (copy fallback was dropped) * Address review: fix stale links, support all manifest depths, scope error _make_link: - Drop size-based idempotency shortcut that kept stale links after ollama pull updates a tag to a same-sized blob. Only samefile() is used now -- if the link doesn't point at the exact same inode, it gets replaced. Scanner: - Revert targeted glob back to rglob so deeper OCI-style repo names (5+ path segments) are not silently skipped. Ollama error: - Only show "Some Ollama models do not work with llama.cpp" when the server output contains GGUF compatibility hints (key not found, unknown architecture, failed to load). Unrelated failures like OOM or missing binaries now show the generic error instead of being misdiagnosed. --------- Co-authored-by: Daniel Han <info@unsloth.ai> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: danielhanchen <michaelhan2050@gmail.com>	2026-04-16 08:24:08 -07:00
Daniel Han	ff23ce40b4	Fix review findings for chat-template repair (#5049 ) (#5056 ) * Fix review findings for PR #49 1. Sandbox fallback Jinja env in _VariantTokenizerProxy.apply_chat_template (use SandboxedEnvironment, matching _derive_assistant_prefix_by_render) 2. Unwrap benign outer-If guards in _template_ends_with_toplevel_for so templates like {% if messages %}{% for ... %}{% endfor %}{% endif %} are still repairable (preserves Qwen3-Guard rejection via else-branch and add_generation_prompt-name checks) 3. Preserve raw name_or_path in _VariantTokenizerProxy._source_path so local-path detection works for dict/list variant tokenizers 4. Context-aware strict-mode messages: omit "will still load" and "Set UNSLOTH_STRICT_CHAT_TEMPLATE=1" when already raising * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 08:02:05 -07:00
Daniel Han	b42e3a120d	Remove legacy venv Scripts entry from User PATH on upgrade (#5060 ) Older installers persisted the venv Scripts directory directly in the User PATH registry. The shim approach from #4961 no longer writes that entry, but on upgrade the old one survived and python.exe / pip.exe from the unsloth venv continued winning resolution in every new shell. Before creating the shim, read the current User PATH, filter out any entry matching $VenvDir\Scripts (using the same symmetric raw+expanded comparison as Add-ToUserPath), and write back if changed. No-op on fresh installs where the legacy entry was never written. Confirmed on a real Windows machine: `where.exe python` was returning the venv interpreter first even after the shim PR merged.	2026-04-16 07:36:59 -07:00
Daniel Han	5b8643969e	Revert "Remove legacy venv Scripts entry from User PATH on upgrade" This reverts commit `cae4a74297`.	2026-04-16 14:20:43 +00:00
Daniel Han	cae4a74297	Remove legacy venv Scripts entry from User PATH on upgrade Older installers persisted the venv Scripts directory directly in the User PATH registry. The shim approach (added in this PR) no longer writes that entry, but it also did not remove the old one. On upgrade, the legacy entry survived and python.exe / pip.exe from the unsloth venv continued winning resolution in every new shell, which is exactly the hijack the shim was designed to prevent. Before creating the shim, read the current User PATH, filter out any entry matching $VenvDir\Scripts (using the same symmetric raw+expanded comparison as Add-ToUserPath), and write back if changed. This runs once per install and is a no-op on fresh installs where the legacy entry was never written.	2026-04-16 14:19:04 +00:00
Datta Nimmaturi	6764cb9b90	Restrict flash attn to <=256 head dim. Consolidate attn impl checks (#5051 ) * Restrict flash attn to <=256 head dim. Consolidate attn impl checks * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Consolidate the changes into single function * safeguard for dict instead of object * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 09:00:17 -05:00
Daniel Han	c5be8b1cd2	Chat-template repair: warn-by-default, AST classification, dict support (#5049 ) * Chat-template repair: warn-by-default, AST classification, dict support Follow-up hardening on top of PR #4426 (which fixed the #4150 RuntimeError for ChatML LoRA reloads). Behavior changes: - Warn-by-default instead of RuntimeError. When fix_chat_template cannot repair a broken template, emit a warning and return the original. Set UNSLOTH_STRICT_CHAT_TEMPLATE=1 to restore the pre-warn hard fail. Fixes the UX where a missing `{% if add_generation_prompt %}` block on a saved LoRA (typical after LlamaFactory / Axolotl re-serialize) would block model loading entirely. - Local path vs HF hub distinguished in the warning message. For local paths the message points at the likely downstream tool; for HF IDs it points at the upstream model maintainers. Previously both said "file a bug report to the maintainers of <path>" even when <path> was the user's own saves/ directory. - Dict / list chat_template now handled. Hermes-3 ships with {default, tool_use} and the previous code crashed with AttributeError: 'dict' object has no attribute 'find' when entering _fix_chat_template with a dict. Each variant is now fixed independently; structure is preserved. Internals: - _find_end_position now matches all four Jinja whitespace-control variants ({% %}, {%- %}, {% -%}, {%- -%}) and returns the rightmost endfor/endif so multi-for templates aren't locked onto the first loop. Previously {%- endfor -%} (both-side dash, used by Qwen3-Guard) was silently bypassed. - _has_add_generation_prompt_block uses Jinja AST via jinja2.nodes.If/Name walks instead of substring matching, so templates that hide the block behind comments or dash-style variants are classified correctly. - _template_ends_with_toplevel_for gates the GH#4150 ChatML repair on the AST: only fires when the last structural top-level node is a For (standard ChatML shape), ignoring trailing pure-whitespace output nodes. Templates wrapped in an outer If (Qwen3-Guard) are now explicitly skipped at the _fix_chat_template level as well, not just at load_correct_tokenizer's name-based exemption. - _validate_patched_template renders the patched template with and without add_generation_prompt and confirms the patched output responds to the flag by appending (not replacing) content. If validation fails, the patch is discarded and we fall through to the warn path. Verified with an expanded regression suite in tests/: - test_fix_chat_template_pr4426.py: 42/42 template-matrix cells - test_load_correct_tokenizer_pr4426.py: 5/5 tokenizer loads - test_chat_template_followups.py: 10/10 new follow-up tests - test_mistral_pr4426.py: 5 Mistral variants byte-identical - test_qwen_pr4426.py: 14 Qwen variants byte-identical (Qwen1.5, Qwen2, Qwen2.5-Instruct/Coder/Math/VL, Qwen3, Qwen3-Coder, QwQ, Qwen3-Guard-Gen) * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Guard _validate_patched_template against read-only chat_template If tokenizer.chat_template is a property or otherwise read-only, the validation helper would crash with AttributeError when trying to temporarily set the patched template. Catch the assignment failure and return False (skip validation), and best-effort restore in the finally block. * Replace regex separator inference with render-diff; broaden repair to non-ChatML templates The previous `_infer_assistant_separator` was a four-tier regex heuristic that only worked on ChatML-shaped templates and forced a hard `<\|im_start\|>` / `<\|im_end\|>` presence gate on Case 2 repair. This meant a Llama-3, Gemma, or Phi-3 template stripped of its generation-prompt block by a downstream tool (LlamaFactory, Axolotl, etc.) would still warn-and-return even though the structural shape is identical to the ChatML case the PR already handles. This replaces the regex with `_derive_assistant_prefix_by_render`: render the template with two dialogs that differ only in assistant content, then `os.path.commonprefix` on the tails captures the exact assistant-turn prefix the template emits. The template itself is ground truth, so non-ChatML shapes work as long as the assistant block is a literal the template emits once per message. Three guards keep the derivation safe: A. both assistant renders extend the base render (no reordering); B. the divergence point is exactly the content-insertion site (sentinel follows the common prefix); C. a user-role cross-check: if a render with a user sentinel also emits the same prefix, role has no effect on output and we reject. A render failure on [user, user] (e.g. Gemma's `raise_exception` alternation check) is evidence that role matters; we accept. Sentinels differ at character 0 so `commonprefix` cannot absorb them, and trailing whitespace/comments after the last `{% endfor %}` are stripped before probing (they would appear in base but not after the appended assistant turn and break Guard A). `_fix_chat_template` and `_repair_string_template` now thread an `is_sharegpt` kwarg; `_fix_chat_template` retries once with `is_sharegpt=True` if the first probe returns None (dual-probe fallback for dict/list callers). The ChatML `<\|im_start\|>` / `<\|im_end\|>` hard gate in Case 2 is dropped. `_infer_assistant_separator` is deleted. Verified via: - tests/test_fix_chat_template_pr4426.py: 51/51 cells (new Llama-3, Gemma, Phi-3 broken-template rows all repair FIX-OK) - tests/test_load_correct_tokenizer_pr4426.py: 5/5 - tests/test_chat_template_followups.py: 18/18 (T11-T18 cover non-ChatML repair + probe failure modes) - tests/test_mistral_pr4426.py: 5/5 byte-identical - tests/test_qwen_pr4426.py: 14/14 byte-identical (Qwen3-Guard AST gate still rejects) - tests/hermes3_lora_pr4426.py reload: patched template ends with `<\|im_start\|>assistant\n`, inference returns sensible output. - temp/sim/battery.py: 79/79 followup; vs baseline: 0 regressions, 9 improvements. - Spot-check probe on real stripped tokenizers (Hermes-3, Phi-4, Llama-3.2-1B, Gemma-3-1B): all derive the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Address reviewer findings: variant routing, positive-gate detection, comment-safe end scan Resolves three reviewer findings on PR #5049 (`fix/chat-template-followups`): Finding #1 [10/10]: dict/list variants now route through `_fix_chat_template_for_tokenizer` via a new `_VariantTokenizerProxy` adapter. Previously the dict/list branches called `_fix_chat_template` directly, silently bypassing the warn/strict (`UNSLOTH_STRICT_CHAT_TEMPLATE`) contract, the `no == yes` diagnostic, broken-existing-block detection, and `_validate_patched_template` guard. The proxy swaps `base.chat_template` to the variant string before each `apply_chat_template` call so tokenizer globals (`bos_token`, custom filters, `raise_exception`) remain available; if the base is read-only it falls back to isolated Jinja rendering. Finding #2 [1/10]: `_has_add_generation_prompt_block` now requires the `If` body to contain at least one `Output` node (a new `_if_body_emits_content` helper walks descendants). This distinguishes a real generation-prompt block from a header guard like `{% if not add_generation_prompt is defined %}{% set ... %}{% endif %}` (body contains only `Assign`) which references the name but emits nothing. Also dropped a now-redundant `"add_generation_prompt" not in scrubbed` guard in `_fix_chat_template` Case 2 so header-guarded templates still get repaired. Finding #4 [1/10]: `_find_end_position` now replaces Jinja comments with equal-length whitespace before scanning for `{% endfor %}` / `{% endif %}` tokens. This prevents a trailing comment containing those tokens from being picked as the real end tag. Positions in the padded string map 1:1 to positions in the original template. Tests: - tests/test_chat_template_followups.py: 21/21 (T19 strict-mode dict variant, T20 header-guard repair, T21 comment-endfor trap added; T4/T5 stubs updated with a working apply_chat_template that routes through Jinja). - tests/test_fix_chat_template_pr4426.py: 51/51 cells unchanged. - tests/test_load_correct_tokenizer_pr4426.py: 5/5. - tests/test_mistral_pr4426.py: 5/5 byte-identical. - tests/test_qwen_pr4426.py: 14/14 byte-identical. - temp/sim/battery.py: 79/79 followup; 0 regressions vs baseline. - Phase 3 Hermes-3 broken-LoRA reload: inference still returns `'The answer to the equation 2+2 is 4.'`. - Spot-checks on Hermes-3 / Phi-4 / Llama-3.2-1B / Gemma-3-1B real stripped templates: probe still derives the expected prefix. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten comments in chat-template helpers Pure comment minimization across `_find_end_position`, `_has_add_generation_prompt_block`, `_if_body_emits_content`, `_derive_assistant_prefix_by_render`, `_fix_chat_template` Case 2, and `_VariantTokenizerProxy`. No behavior change; same intent, fewer lines. All 21 follow-up tests and the 51-cell Phase 1 matrix still pass. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Sandbox probe, fix is_sharegpt validator mismatch, reject negated gates Three real bugs from the 10-agent Opus review: 1. Probe now uses `jinja2.sandbox.SandboxedEnvironment` instead of bare `jinja2.Environment`. The probe renders at model-load time (before the user calls `apply_chat_template`), so it was a new eager code-execution surface that the base HF tokenizer loading does not have. SandboxedEnvironment blocks attribute-chain exploits at negligible cost. 2. `_repair_string_template` now tries validation with both `is_sharegpt=False` and `is_sharegpt=True`. Previously, when `_fix_chat_template` internally fell back to the other schema via its dual-probe, the outer validation still used the caller's original `is_sharegpt` -- rendering with the wrong message keys and spuriously dropping a valid repair. 3. `_has_add_generation_prompt_block` now skips `If` nodes whose test is a `Not` expression. A negated gate like `{% if not add_generation_prompt %}{{ x }}{% endif %}` fires when agp=False, so its emitting body is not a generation block -- but the old code counted any Name reference regardless of polarity. Cleanup: removed unused `self._label`, added `\r` escape in generation-block literal, switched variant labels to `!r` formatting, removed redundant `import os as _os`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Fix jinja2.sandbox import and sandbox proxy fallback Two critical findings from the 20-reviewer pass: 1. [20/20] The proxy read-only fallback used bare `jinja2.Environment`, not sandboxed. All 20 reviewers independently reproduced marker-file creation via `cycler.__init__.__globals__['os'].system(...)` during `fix_chat_template()`. Fixed: fallback now uses `from jinja2.sandbox import SandboxedEnvironment`. 2. [14/20] The render-diff probe did `import jinja2` then referenced `jinja2.sandbox.SandboxedEnvironment`. `jinja2.sandbox` is a submodule that is NOT auto-imported by `import jinja2` on Jinja 3.1.6. This caused `AttributeError` (swallowed by `except Exception`), making the entire Case 2 repair path silently return None in a clean process. The 6 reviewers who saw it work had `jinja2.sandbox` pre-imported by an earlier module in their process. Fixed: both the probe and the proxy fallback now use `from jinja2.sandbox import SandboxedEnvironment`. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-16 05:52:33 -07:00
Daniel Han	6e87bade25	Trim verbose comments in PATH helpers Reduce inline comments from ~160 lines to ~25 across both files. Keep one-line summaries of the "why"; drop multi-paragraph rationale blocks that repeated information already captured in commit messages and PR discussion.	2026-04-16 12:01:01 +00:00
Etherll	ec32ce2e82	fix: use direct registry API for PATH writes instead of SetEnvironmentVariable (#4961 ) * fix: replacing SetEnvironmentVariable with direct registry API * apply reviews * Use CreateSubKey for HKCU\Environment * Store PATH backup under HKCU\Software\Unsloth * Fix $backupKey registry handle leak in PATH backup block Wrap $backupKey operations in try/finally so the handle is closed even if GetValue or SetValue throws. The Add-ToUserPath helper already uses this pattern for its registry key -- the backup block was the only place missing it. * Isolate WM_SETTINGCHANGE broadcast from PATH write error handling Wrap the broadcast dummy-variable calls in their own try/catch so a broadcast failure does not mask a successful registry PATH write. Previously, if SetEnvironmentVariable threw after SetValue already committed the new PATH, Add-ToUserPath would return $false and the caller would skip Refresh-SessionPath. * PATH helper polish: venv precedence, quoted entries, raw/expanded dedup Three small follow-ups surfaced by a 10-reviewer pass against the rebased PR head. None fix a regression vs main; each strictly improves the new helpers. Refresh-SessionPath / Refresh-Environment: - Move $env:Path to the front of the merge so an activated venv keeps precedence over machine/user PATH after a refresh. Pre-PR dropped process-only entries entirely; post-PR kept them but at the back. - Dedup on both raw and expanded forms so %USERPROFILE%\foo and the already-expanded C:\Users\me\foo do not both survive. Add-ToUserPath: - Trim whitespace and surrounding double-quotes from each compared entry so quoted PATH entries like "C:\Program Files\CMake\bin" deduplicate against an unquoted directory of the same path. * Back up User PATH inside Add-ToUserPath, before first mutation Previously only studio/setup.ps1 took a one-time PATH backup, at script top (line ~547). install.ps1 (the irm \| iex entry point) had no backup, so users who installed via that path had no recovery surface if anything clobbered their PATH. The PR description's "one-time backup before any modifications" promise only held for the studio installer flow. Move the backup into Add-ToUserPath itself: just before the first actual SetValue mutation, write the pristine raw PATH to HKCU\Software\Unsloth\PathBackup if no backup already exists. This: - Covers both entry points (install.ps1 and studio/setup.ps1). - Captures the TRUE pristine PATH even when install.ps1 runs first and studio/setup.ps1 runs afterwards (the script-top backup in setup.ps1 would otherwise see an already-modified PATH). - Is idempotent: once a backup exists, subsequent calls preserve it. - Skips when nothing would mutate (dedup match) or PATH is empty. The script-top backup in studio/setup.ps1 is kept for defense in depth. * Refresh PATH: venv-aware merge order Reconcile two competing concerns about Refresh-SessionPath / Refresh-Environment surfaced by separate review rounds: - venv at the back -> activated venv loses precedence to system Python - process at the front -> stale shims (old node, old python, etc.) still on $env:Path can beat a freshly installed tool New merge order: 1. Activated venv Scripts dir, only if $env:VIRTUAL_ENV is set 2. Machine PATH freshly read from registry 3. User PATH freshly read from registry 4. Current $env:Path as fallback This way an explicitly-activated venv keeps priority while a tool the script just installed wins over any stale entry that was already on the inherited shell PATH. When no venv is active, fresh registry entries take precedence as expected. * Append to User PATH by default, close $envKey in finally Add-ToUserPath gains a -Position Append\|Prepend parameter defaulting to Append so installing unsloth no longer prepends the bundled venv Scripts directory ahead of the user's existing python / pip on new shells. The four current call sites (install.ps1 launcher, studio/setup.ps1 CMake, nvcc, Python user Scripts) all take the Append default because each one that needs in-session precedence already does an inline $env:Path prepend independently. This matches rustup / cargo / nvm / pyenv / uv behavior. Also wrap the script-top $envKey.GetValue in a try/finally so the registry handle is released even if the read throws. Matches the pattern already used for $backupKey five lines below. * Prepend cmake, nvcc, Python Scripts; keep venv Scripts appended The previous commit switched Add-ToUserPath to append by default so that installing unsloth would not silently hijack the user's system python / pip. That was correct for the venv Scripts dir (which contains python.exe and pip.exe alongside unsloth.exe), but wrong for the three studio/setup call sites. Those persist cmake, the driver-compatible nvcc, and the Python user Scripts dir for future shells, and in all three cases an older tool already earlier in the user PATH would keep winning after the install finished. The nvcc case is especially load-bearing: setup selects a driver-compatible CUDA toolkit, then llama.cpp builds against whatever wins PATH resolution, so a stale older nvcc produces broken builds. Pass -Position 'Prepend' explicitly at the three setup.ps1 call sites (cmake at line 754, nvcc bin at line 1025, Python user Scripts at line 1191). None of those directories holds python.exe, so prepending them does not re-introduce the original hijack problem. Leave the install.ps1 venv Scripts call on the default Append with a comment explaining why. * Symmetric dedup, Prepend reorders duplicates, unsloth shim dir Address three separate findings surfaced by review: 1. Dedup asymmetry (Gemini high-priority): the existing dedup expanded registry entries via ExpandEnvironmentVariables but did NOT expand the new directory. Passing "%USERPROFILE%\foo" when "C:\Users\me\foo" was already in PATH produced a duplicate. Expand both sides so the check is symmetric. 2. -Position Prepend no-op on existing duplicates: the dedup loop returned $false as soon as it saw a match, regardless of position. That left a late-position duplicate in place instead of moving it to the front, so "prepend the newly selected cmake/nvcc" did not always beat an older copy earlier in PATH. Partition entries into kept and dropped lists, then reinsert a single copy at the requested position. Append still returns $false on any match so user-curated orderings are not reshuffled. Prepend also returns $false when the only copy is already at position 0 so we preserve the user's casing. 3. Stop adding the venv Scripts dir to User PATH entirely. That dir holds python.exe and pip.exe alongside unsloth.exe, so neither Prepend nor Append worked: prepend hijacked the user's system python and pip, append made the freshly-installed unsloth.exe lose to any older unsloth.exe earlier on PATH. Replace the Scripts-dir PATH add with a dedicated shim directory that contains only unsloth.cmd, and prepend that dir. The shim calls the venv's unsloth.exe by absolute path so future pip upgrades inside the venv propagate automatically. * Shim via hardlink, Append user Scripts, drop venv sysconfig fallback Three follow-ups to the `c0ab1ab` shim commit, targeting concerns raised in the second 20-reviewer pass: 1. Shim uses unsloth.exe (hardlink, copy fallback) instead of unsloth.cmd. The batch-file approach had three distinct regressions: - cmd.exe expanded %...% sequences inside user arguments, so prompts like "What does 50% mean?" got mangled before reaching the CLI - Git Bash / MSYS2 / POSIX-style shells on Windows do not resolve bare-name lookups to .cmd files, so `unsloth` stopped working there - Set-Content -Encoding ASCII replaced non-ASCII profile characters with '?', so installs under C:\Users\Jörg\... wrote a broken shim A hardlink (fallback: copy) of unsloth.exe is a native Windows executable with no shell indirection. PATHEXT picks .exe before .cmd in cmd.exe and PowerShell, Git Bash honors .exe natively, subprocess callers hit it directly, and a hardlink stays in sync with the venv on pip upgrades because both names point at the same inode. 2. studio/setup.ps1 Python user Scripts dir is added with default Append instead of -Position Prepend. That directory holds every pip-installed user console script (pip, pytest, huggingface-cli, and so on), not just unsloth, so reordering it silently changed resolution order for unrelated tools. The new install.ps1 shim at PATH position 0 already guarantees `unsloth` resolves to the freshly installed copy, so the Python user Scripts entry only needs to be present, not at the front. 3. The sysconfig lookup in studio/setup.ps1 no longer falls back to sysconfig.get_path('scripts') when the nt_user scheme dir does not exist. When setup.ps1 is invoked from an activated venv (a flow the linked issue actually hits) that fallback returns the venv's Scripts directory, which would then be added to the persisted User PATH and re-introduce the python / pip hijack the shim dir is meant to avoid. Stick strictly to the nt_user scheme; skip the block if it does not exist on disk. * Do not crash installer when unsloth.exe shim is locked The shim update sequence at install.ps1:1095 did a bare Remove-Item / New-Item HardLink / Copy-Item. Under the script's $ErrorActionPreference a locked target (most commonly 'unsloth studio' still running while the user re-invokes the installer) turns the Remove-Item failure into a terminating error that aborts the install with no actionable message. The existing shim is perfectly usable in that state, so there is no reason to abort. Wrap the whole remove/link/copy sequence in a try/catch that logs the probable cause (Studio still running), points at the fix (close Studio and re-run), and lets the installer finish with the old launcher still serving the command. Also only emit the "added unsloth launcher to PATH" step line when the launcher was actually (re)created AND the PATH entry was newly added -- previously the message fired even when the shim refresh silently failed, which was confusing. * Guard shim PATH entry on existence, use NullString for broadcast delete Two follow-ups surfaced by the latest review pass: 1. Do not add the shim directory to User PATH when the launcher was not actually created. Antivirus blocking unsloth.exe, a disk-full volume, or restrictive filesystem permissions can make both the hardlink and the copy fallback fail on a fresh install. In that case the existing sequence would report "added unsloth launcher to PATH" warnings but still prepend the empty $ShimDir to User PATH -- the user sees an install that claims success but then cannot resolve `unsloth` in a new shell. Gate Add-ToUserPath on Test-Path $ShimExe so the PATH entry is only persisted when the launcher is really there. 2. Pass [NullString]::Value instead of $null to the broadcast-delete call in Add-ToUserPath. On PowerShell 7.5 and later (running on .NET 9), a bare $null going into [Environment]::SetEnvironmentVariable can be coerced to an empty string rather than a true .NET null, which sets the dummy UnslothPathRefresh_XXXXXXXX variable to "" in HKCU\Environment instead of deleting it. The leaked variable is visible in System Properties and accumulates one entry per install run. [NullString]::Value is a PowerShell-specific sentinel that crosses the interop boundary as a real null and works on both PS 5.1 and PS 7.x. See PowerShell/PowerShell#24637 for the underlying issue. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: Lee Jackson <130007945+Imagineer99@users.noreply.github.com>	2026-04-16 04:49:51 -07:00
Imgyu Kim	14ab6fbfae	BUG: fix _fix_chat_template for ChatML templates missing add_generation_prompt (#4426 ) Fixes #4150. Pre-PR, `_fix_chat_template` only patched templates where a trailing `{{ ... }}` expression followed the last `{% endfor %}`. ChatML templates (Hermes, Magnum, Phi-4, etc.) that end cleanly at `{% endfor %}` with no generation-prompt block were left unchanged, so the outer `fix_chat_template` raised: ``` RuntimeError: Unsloth: The tokenizer `...` does not have a {% if add_generation_prompt %} for generation purposes. ``` This commonly shows up when a downstream tool (LlamaFactory, Axolotl) re-serializes the tokenizer during LoRA save and strips the generation-prompt block. This PR adds a second branch to `_fix_chat_template` that fires when: - the content after the last `{% endfor %}` is empty modulo Jinja `{# ... #}` comments, - the scrubbed template contains `<\|im_start\|>` and `<\|im_end\|>`, - and the scrubbed template does not already mention `add_generation_prompt`. The assistant-turn separator is inferred from the template itself (preferring an explicit `'<\|im_start\|>assistant<sep>'` literal, then the unique `message['role'] + '<sep>'` from role concatenations, then `<\|im_sep\|>` for Phi-4-mini mixed-separator templates, then `\n`), so Phi-4-style templates are not silently corrupted with the wrong separator. Verified against the existing chat-template corpus: - Hermes-3, Magnum-v2, Phi-4-mini, Phi-4 multi-sep, ChatML with trailing whitespace, ChatML with trailing Jinja comment, dot-access `message.role`, split-literal `'<\|im_start\|>assistant'`: all repaired with the correct assistant prefix. - Already-fixed ChatML templates: idempotent NOP. - Trap templates with `<\|im_start\|>` only inside a Jinja comment: correctly not rewritten. - Llama-3, Gemma-3, Qwen2.5 (non-ChatML): byte-identical. - Mistral family (5 models including Mistral-Nemo, Mistral-Small-24B, Mixtral): byte-identical, protected both by the structural guard (no ChatML tokens) and the existing name-based exemption in `load_correct_tokenizer`. - Qwen family (14 models including Qwen2.5, Qwen3, Qwen3-Coder, QwQ, VL, Math, Qwen3-Guard): byte-identical. End-to-end reproduction: Hermes-3 LoRA SFT, save with stripped chat_template, reload. Pre-PR code path raises the RuntimeError above. Post-PR reload loads cleanly, patches the template at load time, and `apply_chat_template(add_generation_prompt=True)` produces the correct `<\|im_start\|>assistant\n` prefix.	2026-04-16 00:21:29 -07:00
DoubleMathew	a4d4dfe4ac	fix Gemma4 flash attn disable (#5045 ) * fix pass attn implementation * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 17:50:48 -05:00
Daniel Han	3869fbe1cc	Bump installer minimum to 2026.4.5 (#5041 )	2026-04-15 08:23:41 -07:00
Daniel Han	cdb3e752ec	Update _utils.py	2026-04-15 08:06:43 -07:00
Daniel Han	ba387e2c8f	Update pyproject.toml	2026-04-15 08:06:30 -07:00
Daniel Han	f0d03655e8	Studio: add folder browser modal for Custom Folders (#5035 ) * Studio: add folder browser modal for Custom Folders The Custom Folders row in the model picker currently only accepts a typed path. On a remote-served Studio (Colab, shared workstation) that means the user has to guess or paste the exact server-side absolute path. A native browser folder picker can't solve this: HTML `<input type="file" webkitdirectory>` hides the absolute path for security, and the File System Access API (Chrome/Edge only) returns handles rather than strings, neither of which the server can act on. This PR adds a small in-app directory browser that lists paths on the server and hands the chosen string back to the existing `POST /api/models/scan-folders` flow. ## Backend * New endpoint `GET /api/models/browse-folders`: * `path` query param (expands `~`, accepts relative or absolute; empty defaults to the user's home directory). * `show_hidden` boolean to include dotfiles/dotdirs. * Returns `{current, parent, entries[], suggestions[]}`. `parent` is null at the filesystem root. * Immediate subdirectories only (no recursion); files are never returned. * `entries[].has_models` is a cheap hint: the directory looks like it holds models if it is named `models--` (HF hub cache layout) or one of the first 64 children is a .gguf/.safetensors/config.json/ adapter_config.json or another `models--` subfolder. * Sort order: model-bearing dirs, then plain, then hidden; case- insensitive alphabetical within each bucket. * Suggestions auto-populate from HOME, the HF cache root, and any already-registered scan folders, deduplicated. * Error surface: 404 for missing path, 400 for non-directory, 403 on permission errors. Auth-required like the other models routes. * New Pydantic schemas `BrowseEntry` and `BrowseFoldersResponse` in `studio/backend/models/models.py`. ## Frontend * New `FolderBrowser` component (`studio/frontend/src/components/assistant-ui/model-selector/folder-browser.tsx`) using the existing `Dialog` primitive. Features: * Clickable breadcrumb with a `..` row for parent navigation. * Quick-pick chips for the server-provided suggestions. * `Show hidden` checkbox. * In-flight fetch cancellation via AbortController so rapid navigation doesn't flash stale results. * Badges model-bearing directories inline. * `chat-api.ts` gains `browseFolders(path?, showHidden?)` and matching types. * `pickers.tsx` adds a folder-magnifier icon next to the existing `Add` button. Opening the browser seeds it with whatever the user has already typed; confirming fills the text input, leaving the existing validation and save flow unchanged. ## What it does NOT change * The existing text-input flow still works; the browser is additive. * No new permissions or escalation; the endpoint reads only directories the server process is already allowed to read. * No model scanning or filesystem mutation happens from the browser itself -- it just returns basenames for render. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: cap folder-browser entries and expose truncated flag Pointing the folder browser at a huge directory (``/usr/lib``, ``/proc``, or a synthetic tree with thousands of subfolders) previously walked the whole listing and stat-probed every child via ``_looks_like_model_dir``. That is both a DoS shape for the server process and a large-payload surprise for the client. Introduce a hard cap of 2000 subdirectory entries and a ``truncated: bool`` field on the response. The frontend renders a small hint below the list when it fires, prompting the user to narrow the path. Below-cap directories are unchanged. Verified end-to-end against the live backend with a synthetic tree of 2050 directories: response lands at 2000 entries, ``truncated=true``, listing finishes in sub-second time (versus tens of seconds if we were stat-storming). * Studio: suggest LM Studio / Ollama dirs + 2-level model probe Three improvements to the folder-browser, driven by actually dropping an LM Studio-style install (publisher/model/weights.gguf) into the sandbox and walking the UX: ## 1. Quick-pick chips for other local-LLM tools `well_known_model_dirs()` (new) returns paths commonly used by adjacent tools. Only paths that exist are returned so the UI never shows dead chips. * LM Studio current + legacy roots + user-configured `downloadsFolder` from its `settings.json` (reuses the existing `lmstudio_model_dirs()` helper). * Ollama: `$OLLAMA_MODELS` env override, then `~/.ollama/models`, `/usr/share/ollama/.ollama/models`, and `/var/lib/ollama/.ollama/models` (the systemd-service install path surfaced in the upstream "where is everything?" issue). * Generic user-choice locations: `~/models`, `~/Models`. Dedup is stable across all sources. ## 2. Two-level model-bearing probe LM Studio and Ollama both use `root/publisher/model/weights.gguf`. The previous `has_models` heuristic only probed one level, so the publisher dir (whose immediate children are model dirs, not weight files) was always marked as non-model-bearing. Pulled the direct- signal logic into `_has_direct_model_signal` and added a grandchild probe so the classic layout is now recognised. Still O(PROBE^2) worst-case, still returns immediately for `models--` names (HF cache layout) and for any direct weight file. ## 3. model_files_here hint on response body A leaf model dir (just GGUFs, no subdirs) previously rendered as `(empty directory)` in the modal, confusing users into thinking the folder wasn't scannable. Added a `model_files_here` count on the response (capped at 200) and a small hint row in the modal: `N model files in this folder. Click "Use this folder" to scan it.` ## Verification Simulated an LM Studio install by downloading the real 84 MB `unsloth/SmolLM2-135M-Instruct-Q2_K.gguf` into `~/.lmstudio/models/unsloth/SmolLM2-135M-Instruct-GGUF/`. Confirmed end-to-end: Home listing suggests `~/.lmstudio/models` as a chip. * Browsing `~/.lmstudio/models` flags `unsloth` (publisher) as `has_models=true` via the 2-level probe. * Browsing the publisher flags `SmolLM2-135M-Instruct-GGUF` (model dir) as `has_models=true`. * Browsing the model dir returns empty entries but `model_files_here=1`, and the frontend renders a hint telling the user it is a valid target. * Studio: one-click scan-folder add + prominent remove + plain search icon Three small Custom Folders UX fixes after real-use walkthrough: * One-click add from the folder browser. Confirming `Use this folder` now submits the path directly to `POST /api/models/scan-folders` instead of just populating the text input. `handleAddFolder` takes an optional explicit path so the submit lands in the same tick as `setFolderInput`, avoiding a state-flush race. The typed-path + `Add` button flow is unchanged. * Prominent remove X on scan folders. The per-folder delete button was `text-muted-foreground/40` and hidden entirely on desktop until hovered (`md:opacity-0 md:group-hover:opacity-100`). Dropped the hover-only cloak, bumped color to `text-foreground/70`, added a red hover/focus background, and sized the icon up from `size-2.5` to `size-3`. Always visible on every viewport. * Plain search icon for the Browse button. `FolderSearchIcon` replaced with `Search01Icon` so it reads as a simple "find a folder" action alongside the existing `Add01Icon`. * Studio: align Custom Folders + and X buttons on the same right edge The Custom Folders header used `px-2.5` with a `p-0.5` icon button, while each folder row used `px-3` with a `p-1` button. That put the X icon 4px further from the right edge than the +. Normalised both rows to `px-2.5` with `p-1` so the two icons share a column. * Studio: empty-state button opens the folder browser directly The first-run empty state for Custom Folders was a text link reading "+ Add a folder to scan for local models" whose click toggled the text input. That's the wrong default: a user hitting the empty state usually doesn't know what absolute path to type, which is exactly what the folder browser is for. * Reword to "Browse for a models folder" with a search-icon affordance so the label matches what the click does. * Click opens the folder browser modal directly. The typed-path + Add button flow is still available via the + icon in the section header, so users who know their path keep that option. * Slightly bump the muted foreground opacity (70 -> hover:foreground) so the button reads as a primary empty-state action rather than a throwaway hint. * Studio: Custom Folders header gets a dedicated search + add button pair The Custom Folders section header had a single toggle button that flipped between + and X. That put the folder-browser entry point behind the separate empty-state link. Cleaner layout: two buttons in the header, search first, then add. * Search icon (left) opens the folder browser modal directly. * Plus icon (right) toggles the text-path input (unchanged). * The first-run empty-state link is removed -- the two header icons cover both flows on every state. Both buttons share the same padding / icon size so they line up with each other and with the per-folder remove X. * Studio: sandbox folder browser + bound caps + UX recoveries PR review fixes for the Custom Folders folder browser. Closes the high-severity CodeQL path-traversal alert and addresses the codex / gemini P2 findings. Backend (studio/backend/routes/models.py): * New _build_browse_allowlist + _is_path_inside_allowlist sandbox. browse_folders now refuses any target that doesn't resolve under HOME, HF cache, Studio dirs, registered scan folders, or the well-known third-party model dirs. realpath() is used so symlink traversal cannot escape the sandbox. Also gates the parent crumb so the up-row hides instead of 403'ing. * _BROWSE_ENTRY_CAP now bounds visited iterdir entries, not appended entries. Dirs full of files (or hidden subdirs when show_hidden is False) used to defeat the cap. * _count_model_files gets the same visited-count fix. * PermissionError no longer swallowed silently inside the enumeration / counter loops -- now logged at debug. Frontend (folder-browser.tsx, pickers.tsx, chat-api.ts): * splitBreadcrumb stops mangling literal backslashes inside POSIX filenames; only Windows-style absolute paths trigger separator normalization. The Windows drive crumb value is now C:/ (drive root) instead of C: (drive-relative CWD-on-C). * browseFolders accepts and forwards an AbortSignal so cancelled navigations actually cancel the in-flight backend enumeration. * On initial-path fetch error, FolderBrowser now falls back to HOME instead of leaving the modal as an empty dead end. * When the auto-add path (one-click "Use this folder") fails, the failure now surfaces via toast in addition to the inline paragraph (which is hidden when the typed-input panel is closed). * Studio: rebuild browse target from trusted root for CodeQL clean dataflow CodeQL's py/path-injection rule kept flagging the post-validation filesystem operations because the sandbox check lived inside a helper function (_is_path_inside_allowlist) and CodeQL only does intra-procedural taint tracking by default. The user-derived ``target`` was still flowing into ``target.exists`` / ``target.is_dir`` / ``target.iterdir``. The fix: after resolving the user-supplied ``candidate_path``, locate the matching trusted root from the allowlist and rebuild ``target`` by appending each individually-validated segment to that trusted root. Each segment is rejected if it isn't a single safe path component (no separators, no ``..``, no empty/dot). The downstream filesystem ops now operate on a Path constructed entirely from ``allowed_roots`` (trusted) plus those validated segments, so CodeQL's dataflow no longer sees a tainted source. Behavior is unchanged for all valid inputs -- only the construction of ``target`` is restructured. Live + unit tests all pass (58 selected, 7 deselected for Playwright env). * Studio: walk browse paths from trusted roots for CodeQL --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ubuntu <ubuntu@h100-8-cheapest.us-east5-a.c.unsloth.internal>	2026-04-15 08:04:33 -07:00
Roland Tannous	800ddc95f8	Re-apply #4939 : updated models template mappers (#4950 ) * Reapply "updated models template mappers. added lfm2.5vl450m to transformers 5…" (#4945) This reverts commit `33503ea248`. * Add missing gemma-4-31B-it bnb-4bit mapper entry and LFM2.5 upstream namespace for PR #4950 - Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to __INT_TO_FLOAT_MAPPER so the int-to-float resolution works for this model (already listed in TEMPLATE_TO_MODEL_MAPPER but had no mapper entry). - Add LiquidAI/LFM2.5-1.2B-Instruct to lfm-2.5 TEMPLATE_TO_MODEL_MAPPER entry so the canonical upstream namespace is mapped consistently with lfm-2. * Add missing gemma-4-31B-it bnb-4bit Ollama mapping and lfm-2.5 chat template alias - Add unsloth/gemma-4-31B-it-unsloth-bnb-4bit to OLLAMA_TEMPLATE_TO_MODEL_MAPPER so Ollama export works for this model (E2B-it and E4B-it bnb-4bit variants were already present, 31B-it was inconsistently omitted) - Register CHAT_TEMPLATES["lfm-2.5"] as alias of the lfm-2 template to prevent KeyError when Studio resolves LFM2.5 models through MODEL_TO_TEMPLATE_MAPPER * Add missing LFM2 bnb-4bit INT_TO_FLOAT_MAPPER entry unsloth/LFM2-1.2B-unsloth-bnb-4bit is referenced in model_mappings.py but had no mapper.py entry, so model resolution would fail when users load that variant with load_in_4bit=False or when the float name is used with load_in_4bit=True. * Fix review findings for PR #16 1. ollama_template_mappers.py: Restore dropped Gemma-4 base model IDs (E2B, E4B, 31B, 26B-A4B) and add missing google/ upstream IDs to the gemma4 Ollama mapper for consistency with other gemma entries. 2. mapper.py: Remove self-mapping non-bnb-4bit entries from __INT_TO_FLOAT_MAPPER that were polluting FLOAT_TO_INT_MAPPER with lowercase 16-bit names, causing load_in_4bit=True to return bad model names. Add direct MAP_TO_UNSLOTH_16bit entries to preserve the google->unsloth 16-bit redirects. 3. mapper.py: Add LFM2.5 MAP_TO_UNSLOTH_16bit redirect so LiquidAI/LFM2.5-1.2B-Instruct resolves to its unsloth mirror. * Add review tests for PR #4950 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Remove top-level test files These test_.py files were added at the repo root rather than under tests/. Removing them from this PR; the production mapper changes remain. Add gemma-4-26B-A4B-it mapping Adds unsloth/gemma-4-26B-A4B-it to __INT_TO_FLOAT_MAPPER as a 2-tuple so google/gemma-4-26B-A4B-it routes to unsloth/gemma-4-26B-A4B-it across INT_TO_FLOAT_MAPPER, FLOAT_TO_INT_MAPPER, and MAP_TO_UNSLOTH_16bit. The 26B-A4B (MoE) model has no bnb-4bit variant, so the key uses the plain unsloth name rather than the -unsloth-bnb-4bit suffix. Removes the now-redundant standalone _add_with_lower call for the -it variant; the 16bit mapping is registered via the dict loop. * Add unsloth-bnb-4bit mappings for gemma-4 base (non-it) models Adds E2B, E4B, 31B base unsloth-bnb-4bit entries to __INT_TO_FLOAT_MAPPER. The 26B-A4B (MoE) base has no bnb-4bit variant on HF, so it stays on the standalone _add_with_lower line for the 16bit-only routing. Removes the redundant _add_with_lower lines for E2B, E4B, 31B base since the dict loop now registers the same google->unsloth route through the 2-tuple entries, plus full FLOAT_TO_INT and INT_TO_FLOAT coverage. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 07:52:12 -07:00
Avaya Aggarwal	7c5464ad71	feat: Add cactus QAT scheme support (#4679 ) * feat: Add cactus QAT scheme support * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * test(qat): add tests for cactus QAT scheme and fix missing import * Fix cactus QAT scheme: correct MappingType import, tighten PerGroup filter - Drop the broken `from torchao.dtypes import MappingType` import. `MappingType` lives in `torchao.quantization` (and `torchao.quantization.quant_primitives`); it is not exported from `torchao.dtypes` in any supported torchao release (verified on 0.14, 0.16, 0.17). The previous code raised `ImportError` on every cactus call and was masked as a misleading 'torchao not found' error. - Since `IntxWeightOnlyConfig` already defaults `mapping_type` to `MappingType.SYMMETRIC`, drop the explicit kwarg entirely and remove the import. Behavior is unchanged. - Introduce a named `group_size = 32` constant (matches the int4 / fp8-int4 pattern in the surrounding branches) and add a `% group_size == 0` divisibility guard to the filter. `PerGroup(32)` requires `in_features % 32 == 0` at `quantize_()` time, otherwise torchao raises `ValueError: in_features (N) % group_size (32) must be == 0`. The old `in_features >= 32` filter would admit non-aligned widths (e.g. 33, 48, 65, 127) and crash `_prepare_model_for_qat` for those shapes. * Warn when cactus QAT skips non-divisible Linear layers Multiple reviewers flagged that the divisibility guard added in the previous commit can silently leave Linear layers in full precision when their in_features is not a multiple of 32. For currently supported Unsloth models (Qwen, Llama, Gemma, Mistral, Phi) every Linear width is already a multiple of 32/64/128 so this never triggers, but surfacing the coverage gap is cheap and avoids users assuming 100% QAT coverage when they bring a custom model with unusual shapes. Emit a UserWarning listing up to the first 8 skipped layers whenever the cactus filter excludes any Linear due to the modulo guard. This keeps the lenient silent-skip behavior (consistent with int4 / fp8-int4), but stops making it silent. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-15 07:40:03 -07:00
Avaya Aggarwal	f18e9dddf0	feat: Add support for OLMo-3 model (#4678 ) * feat: Add support for OLMo-3 model in mapping and tests * Update unsloth/models/mapper.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Update tests/test_get_model_name.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * Fix casing, add Think variants, and align version gate for OLMo-3 PR 4678 Mapper: switch slugs from OLMo-3 to canonical Olmo-3 mixed case, drop the non-existent unsloth/Olmo-3-7B-Instruct-bnb-4bit dead alias, and add the already-published Olmo-3-7B-Think and Olmo-3-32B-Think Unsloth mirrors. Loader: change the olmo3 transformers version gate from Version("4.57.0") to Version("4.57.0.dev0") so nightly/source builds that already contain olmo3 are not blocked, matching the OLMo-2, Gemma 3 and Cohere patterns. * Use canonical Olmo-3 casing and cover Think variants in OLMo-3 tests Mirrors the mapper.py fixes on pr-4678-code: HuggingFace canonical slugs for the OLMo-3 family use mixed-case Olmo-3 (not OLMo-3 like OLMo-2), and Unsloth already hosts Olmo-3-7B-Think and Olmo-3-32B-Think mirrors, so the resolution matrix now covers all three published Olmo-3 families. --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-15 07:39:11 -07:00
Daniel Han	c3cd890357	Studio: refresh Downloaded GGUF list and recurse into variant subdirs (#5032 ) * Studio: refresh Downloaded GGUF list and recurse into variant subdirs Two fixes for the model picker's "Downloaded" section. Frontend (`pickers.tsx`): * `HubModelPicker`'s mount effect short-circuited the cached-gguf and cached-models refetch whenever the module-level cache already had entries (`if (alreadyCached) return;`). After downloading a new repo in the same session, reopening the picker rendered the stale cache and the new repo never appeared in "Downloaded" until a full page reload. The early return is removed so the lists are always refreshed on mount; the module cache still drives the initial render so there is no spinner flash when we already had data. Backend (`utils/models/model_config.py`): * `list_local_gguf_variants` and `_find_local_gguf_by_variant` used a non-recursive `Path.glob(".gguf")`. Some HF GGUF repos (e.g. `unsloth/gemma-4-26B-A4B-it-GGUF`) place the largest quants under a variant-named subdirectory such as `BF16/...gguf`, which the top-level glob missed. Both helpers now use `rglob` and the variant filename is stored as a path relative to the scan root so the locator can still find the file. The flat-layout case (variants directly in the snapshot root) is unchanged: verified against `unsloth/gemma-4-E2B-it-GGUF` which still returns its UD-Q4_K_XL variant correctly. Studio: emit posix-style relative filenames for local GGUF subdirs `list_local_gguf_variants` was doing `str(f.relative_to(p))`, which on Windows produces backslash-separated paths like `BF16\foo.gguf`. The remote `list_gguf_variants` (HF API path) always returns forward-slash filenames such as `BF16/foo.gguf`, so the two would diverge on Windows. Switch to `.as_posix()` so the local and remote variant filenames stay identical across Linux, macOS, and Windows. Verified by simulating with `PureWindowsPath` in the test suite. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Studio: detect mmproj at snapshot root for nested-variant layouts When _find_local_gguf_by_variant returns a weight file inside a quant-named subdir (e.g. snapshot/BF16/foo.gguf), detect_mmproj_file was scanning only the immediate parent and missing the mmproj file sitting at the snapshot root. The model was then loaded without --mmproj, silently breaking vision support for repos that ship nested variants. detect_mmproj_file now takes an optional search_root and walks up from the weight file to that root, in order, so the mmproj at the snapshot root is picked up. Sibling quant subdirs are not scanned, so an unrelated variant's mmproj does not leak in. Also apply the suggested micro-optimization on relative_to in list_local_gguf_variants -- only build the posix path when storing the first file for a quant. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 07:34:42 -07:00
Daniel Han	156f3fc4b0	Gate trl disable_gradient_checkpointing patch warning on UNSLOTH_ENABLE_LOGGING (#5038 ) The "Patched trl.models.utils.disable_gradient_checkpointing with a no-op" warning fires once on every Unsloth import, including from notebooks where the user did not opt into verbose logging. It is a routine integration patch, not an anomaly the user needs to know about. Gate it on UNSLOTH_ENABLE_LOGGING=1 like other diagnostic notices.	2026-04-15 07:33:48 -07:00
jonahsamost	777e1bd0ac	fix (#4887 )	2026-04-15 07:21:03 -07:00
Daniel Han	1a4ca5eca8	Fix grad-accum accepts_loss_kwargs detection for vision wrappers (#5036 ) * Fix grad-accum model_accepts_loss_kwargs detection for vision wrappers Replace the source-string rewrite of Trainer.__init__ with an instance-level accepts_loss_kwargs shadow applied on the loaded model. Covers: 1. Unsloth-compiled forward -> True, so HF Trainer does not double-scale on top of unsloth_fixed_cross_entropy's num_items_in_batch division. 2. Stock forward on a conditional-generation wrapper (Gemma3n, Gemma3 pre-4.57, Qwen-VL family, etc.) where the outer class has no accepts_loss_kwargs but the inner .model declares False -> False. This is the case that reproduces issue #4982 under trust_remote_code or UNSLOTH_COMPILE_DISABLE, where the previous fix's outer-attr check walked past the inner model and fell through to signature inspection. 3. Text LMs without any explicit accepts_loss_kwargs -> leave HF default. The previous .replace()-based patch silently no-ops on transformers 4.48 through 4.52 (variable named model, not unwrapped_model) and is fragile against any upstream reformat. The new helper walks the PEFT / HF wrapper chain, finds the first class that declares accepts_loss_kwargs on its own class dict (type(m).__dict__, not hasattr, to avoid PEFT __getattr__ forwarding), and setattr-shadows that value at every wrapper level so HF Trainer's hasattr(unwrapped_model, ...) check picks it up at whichever level accelerate.unwrap_model returns. Also adds an unconditional post-init clamp of accelerator.gradient_accumulation_steps = 1 to work around the transformers 5.0 through 5.5 GradientAccumulationPlugin regression that makes accelerator.backward divide loss by GA on top of training_step's own /GA division. Fixed upstream in 5.6.0.dev0; no-op on 4.x and 5.6+. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Trim comments * Address review: cover PEFT-after-load and custom compile location Two review findings from 3/20 reviewers: 1. [3 of 20 reviewers] apply_accepts_loss_kwargs_fix was called from the loaders before get_peft_model wraps the base model, so on transformers 4.48-4.52 (which does hasattr on the outer model) the instance shadow on the base model was lost after PEFT wrapping. Fix: also call it from the wrapped Trainer.__init__ so it runs on whatever model the user actually hands to Trainer, which is always the final wrapped form. 2. [1 of 20 reviewers] _forward_is_unsloth_compiled hard-coded the substrings "unsloth_compiled" / "unsloth_cache" in the co_filename check, which misclassifies compiled forwards when UNSLOTH_COMPILE_LOCATION is set to a custom directory. Fix: new _unsloth_compile_cache_leaves helper that reads the env var and matches the basename against path components, honoring both the default and any user override. Verified locally: - PEFT-after-load simulation: HF's hasattr(peft, "accepts_loss_kwargs") now returns True after our init wrapper runs, and value resolves to False on Gemma3n-style inner wrappers. - Custom UNSLOTH_COMPILE_LOCATION simulation: compiled detection returns True for /tmp/my_custom_cache/compiled.py when the env var is set. - End-to-end Gemma-3 270m + LoRA SFT unchanged: loss 4.9626, grad-norm matches prior run, all 4 wrapper levels now carry the shadowed attr. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 06:59:36 -07:00
Daniel Han	1ccfd2e0a5	fix(rocm): tighten gfx regex to ignore generic ISA lines (#5033 ) * fix(rocm): tighten gfx regex to ignore generic ISA lines ROCm 6.1+ rocminfo emits generic ISA names such as "amdgcn-amd-amdhsa--gfx11-generic" and "amdgcn-amd-amdhsa--gfx9-4-generic" alongside the real GPU name. The previous `gfx[1-9]` regex used in `_has_rocm_gpu` matched both, so a host with only a generic ISA entry would be reported as having a usable AMD GPU. Tighten the pattern to `gfx[1-9][0-9a-z]{2,3}` so only real gfx ids match. This covers every documented target from GFX6 (gfx600) through GFX12 (gfx1201), including letter-suffixed ids like gfx90a (MI250 / MI250X) and gfx90c. Documented generic ISA names always have 1 or 2 digits before the dash and no longer match. Applied to both `studio/install_python_stack.py` and `studio/install_llama_prebuilt.py` so the two detection paths agree. Co-authored-by: Martin Hoyer <mhoyer@redhat.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: Martin Hoyer <mhoyer@redhat.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:24:41 -07:00
Daniel Han	b7a8ff2833	Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027 ) (#5034 ) * Respect classification head skip list on pre-quantized 4-bit checkpoints (#5027) FastLanguageModel.from_pretrained(..., num_labels=N) crashed with "NotImplementedError: normal_kernel_cuda not implemented for 'Byte'" on pre-quantized bnb 4-bit checkpoints (e.g. unsloth/Qwen3-4B-bnb-4bit) when running on transformers 5.x. Two pieces were needed to close this out: 1. unsloth_zoo PR: add "score", "classifier", "qa_outputs" to SKIP_QUANTIZATION_MODULES so replace_with_bnb_linear leaves task heads in the compute dtype. 2. This commit: for pre-quantized checkpoints, transformers reads llm_int8_skip_modules from the quantization_config baked into config.json and ignores the runtime BitsAndBytesConfig we pass via kwargs. Unsloth must merge its skip list into model_config.quantization_config.llm_int8_skip_modules before the from_pretrained call, or the checkpoint's frozen list (e.g. ["lm_head", "multi_modal_projector", "merger", "modality_projection"]) wins and the `score` head gets converted to Linear4bit with uint8 storage, then _init_weights calls normal_ on uint8 and crashes. Also add a defensive post-load cast on the task head to guard against any residual path that ends up with a non-floating head dtype. Verified on transformers 4.57.6 and 5.5.0 with: - unsloth/Qwen3-4B-bnb-4bit + num_labels=3 - unsloth/Qwen3-4B (non-bnb repo, load_in_4bit=True) - unsloth/Llama-3.2-1B-Instruct + num_labels=3 - unsloth/ModernBERT-large classifier head (bert_classification notebook) - Regression: causal LM path unchanged, backbone still 4-bit - 3-step SFT on num_labels=3 confirms gradient flow and weight updates on score.weight Fixes unslothai/unsloth#5027 * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 05:16:33 -07:00
David Solanas Sanz	1fcb2502cf	fix: prevent offline freeze by fixing stats retry and forwarding local_files_only (#5016 ) Fixes #2393. - `_utils.py`: `has_internet()` now respects `HF_HUB_OFFLINE` with truthy variant parsing in addition to `TRANSFORMERS_OFFLINE`. - `_utils.py`: replace uncontrolled `except Exception: stats_check()` retry (which had no time limit and could freeze on Kaggle offline mode) with a logged skip. - `loader.py`: forward `local_files_only` from kwargs into all `AutoConfig.from_pretrained` and `PeftConfig.from_pretrained` probes in `FastLanguageModel.from_pretrained` and `FastModel.from_pretrained`, including the PEFT base-model reload paths.	2026-04-15 04:51:31 -07:00
Lee Jackson	f9ef639dde	Studio: support GGUF variant selection for non-suffixed repos (#5023 ) * fix: support GGUF variant selection for non-suffixed repos * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: harden GGUF detection across cached models and picker flows * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * chore: use shared GGUF picker helper for search rows * fix: avoid mixed cache duplication and preserve GGUF fallback detection * fix: unify GGUF cache matching and merge picker hints * fix: normalize local GGUF matching across picker and model config * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: robust cached-gguf classification + hint-aware click routing - _repo_gguf_size_bytes: treat size_on_disk=None as 0 and dedupe fallback by commit_hash so partial/interrupted downloads don't TypeError out of sum() and wipe the entire cached list. - list_cached_gguf / list_cached_models: narrow per-repo try/except so one malformed repo no longer poisons the whole response. - handleModelClick: route through isKnownGgufRepo instead of the suffix-only isGgufRepo, so non-suffixed GGUF repos still open the variant expander from every call site. - Replace the modelIsGgufById/resultIsGgufById Maps with Sets of known GGUF ids to stop conflating "no hint" with "known not-GGUF". - Make HfModelResult.isGguf required (it is always set in makeMapModel). - Add regression tests for the None size case, mixed-repo inclusion in cached-gguf, and per-repo error isolation. * fix: exclude mmproj from GGUF classification and case-normalize hint lookups - _repo_gguf_size_bytes now filters mmproj vision-adapter files so safetensors+mmproj.gguf repos stay on the cached-models path and non-GGUF rows no longer show zero pickable variants. A vision-capable GGUF repo (main weight + mmproj adapter) still classifies as GGUF and reports the main weight size. - modelGgufIds / resultGgufIds now key on lowercased ids and isKnownGgufRepo lowercases its lookup, so store and HF-search ids that differ only by casing still match the same GGUF hint. - New regression tests: mmproj-only repo excluded from cached-gguf, same repo included in cached-models, vision-capable repo still classified as GGUF with correct size. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Roland Tannous <rolandtannous@gravityq.ai> Co-authored-by: Roland Tannous <115670425+rolandtannous@users.noreply.github.com>	2026-04-15 15:32:01 +04:00
Roland Tannous	13928b5f0e	Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var (#5024 ) * Add configurable PyTorch mirror via UNSLOTH_PYTORCH_MIRROR env var When set, UNSLOTH_PYTORCH_MIRROR overrides the default https://download.pytorch.org/whl base URL in all four install scripts (install.sh, install.ps1, studio/setup.ps1, studio/install_python_stack.py). When unset or empty, the official URL is used. This lets users behind corporate proxies or in regions with poor connectivity to pytorch.org point at a local mirror without patching scripts. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Add pytest for UNSLOTH_PYTORCH_MIRROR in install_python_stack.py Tests that _PYTORCH_WHL_BASE picks up the env var when set, falls back to the official URL when unset or empty, and preserves the value as-is (including trailing slashes). * Remove stale test assertions for missing install.sh messages * Fix GPU mocking in test_get_torch_index_url.sh Extract _has_usable_nvidia_gpu and _has_amd_rocm_gpu alongside get_torch_index_url so the GPU-presence checks work in tests. Add -L flag handling to mock nvidia-smi so it passes the GPU listing check. All 26 tests now pass on CPU-only machines. * Strip trailing slash from UNSLOTH_PYTORCH_MIRROR to avoid double-slash URLs --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-15 11:39:11 +04:00
Datta Nimmaturi	826c98f3c0	[moe][gemma4] Target MoE for gemma4 (#4913 ) * Target MoE for gemma4 * refactor attention impl determine * Revert "refactor attention impl determine" This reverts commit 888fca08110a9a74278dc1ebc14d0da043bbd11d. * Remove attention policy changes from gemma4 MoE fix	2026-04-14 16:53:07 -05:00
Daniel Han	5aa8c15246	Studio: hard-stop at n_ctx with a 'Context limit reached' toast (#5021 ) * Studio: hard-stop at n_ctx with a dedicated 'Context limit reached' toast llama-server's default behavior when the KV cache fills is to silently drop the oldest non-``n_keep`` tokens and keep generating. The UI has no way to tell the user that earlier turns were evicted -- they just see degraded continuity and a confusing ``5,361 / 4,096`` on the context usage bar. Launch llama-server with ``--no-context-shift`` so it returns a clean error once the request would exceed ``n_ctx``. In the chat adapter, catch the error, identify it as a context-limit error via ``isContextLimitError()``, and surface a dedicated toast that names the exact control to adjust: the ``Context Length`` field in the chat Settings panel. Also add a lightweight tooltip hint on ``ContextUsageBar`` when usage crosses 85%, so users see the "raise Context Length in Settings" suggestion before they hit the hard stop. Tests: * ``test_llama_cpp_no_context_shift.py`` pins the ``--no-context-shift`` flag in the static launch-command template, and pins it inside the unconditional ``cmd = [ ... ]`` block so a future refactor can't hide it behind a branch. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Shorten --no-context-shift comment to 1 line * Match backend _friendly_error rewrite in isContextLimitError Codex review on PR caught that ``backend/routes/inference.py::_friendly_error`` rewrites the raw llama-server text "request (X tokens) exceeds the available context size (Y tokens)" into "Message too long: X tokens exceeds the Y-token context window. ..." on the main streaming GGUF path. The heuristic only looked for "context size" / "exceeds the available context" / "context shift", none of which survive the rewrite, so the new "Context limit reached" toast would never fire for the most common case. Add matches for "message too long" and "context window" so both wordings hit. Also addresses Gemini feedback on the launch-flag test: * Use ``inspect.getsource(LlamaCppBackend.load_model)`` instead of reading ``__file__`` directly; scopes the assertions to the function that actually launches llama-server. * Replace the hardcoded ``" ]"`` indent search with a line-at-a-time scan for a line that is just ``]``, so the test survives reformatting. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:20 -07:00
Daniel Han	5861a7ce15	Studio: split model-load progress label across two rows (#5020 ) * Studio: split model-load progress label across two rows The chat flow and training overlay both compose a progress label like "112.6 of 122.3 GB • 331.0 MB/s • 30s left" and render it next to the percent badge in a single flex row. Once the rate + ETA part shows up, the label outgrows the row width and wraps mid-phrase, orphaning the percent ("19 left %") onto a second ragged line. Fix in model-load-status.tsx: split the label on the first " • " into a primary (size) chunk that stays on row 1 with the percent, and a secondary (rate/ETA) chunk that renders on its own muted row below. Labels without a bullet (e.g. "22.8 GB downloaded") collapse cleanly to one row. The inline-status variant keeps only the primary and surfaces the full label via the tooltip. Also extracts the rate/ETA math out of useTransferStats into a pure ``transfer-stats.ts`` module (appendSample + computeTransferStats) so it can be reasoned about and tested without React. The hook is now a thin wrapper that feeds sample history through the pure functions. Backend: adds two companion test files for load_progress(): * test_llama_cpp_load_progress_matrix.py (21 tests) -- platform matrix (Linux /proc, macOS/Windows absence), VmRSS parsing variants (tab/space/missing/malformed), filesystem edges (HF-cache symlinks, broken symlinks, nonexistent paths, relative paths), shard aggregation (partial multi-shard, two series in same dir, mmproj-* exclusion, single-file), lifecycle races, concurrent sampling (10 threads x 50 iters against real /proc), fraction bounds. * test_llama_cpp_load_progress_live.py (5 tests) -- no-mock live integration: real subprocess allocating 100 MB to match VmRSS, real ready phase, real dead-pid degradation, real shard aggregation, repeated polling. Skipped on non-Linux. Both complement the existing test_llama_cpp_load_progress.py. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Hoist splitProgressLabel out of JSX IIFE (review feedback) --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 10:58:16 -07:00
Eda Z	5b8dbdc3c2	Fix bitsandbytes ROCm install by using pip instead of uv (#4966 ) * Fix bitsandbytes ROCm install by using pip instead of uv * Also use pip for PyPI fallback path in _install_bnb_rocm The original fix correctly switched the pre-release wheel install from uv to pip, but left the PyPI fallback path on uv. If uv breaks bnb on ROCm, the fallback would hit the same issue. Move pip bootstrap before the branch so both paths use pip consistently. * Harden pip bootstrap: try ensurepip first, warn on failure - Try ensurepip --upgrade before falling back to uv pip install pip. ensurepip works offline and does not need PyPI, making the bootstrap robust when the network or index is unavailable. - If both ensurepip and uv fail, emit a visible warning instead of silently swallowing the error (which previously led to a cryptic "No module named pip" downstream). - Use run_maybe_quiet so --verbose users see bootstrap output. - Update comment to document the actual root cause: uv rejects the wheel because filename version and metadata version disagree. * Add --isolated to pip install calls in _install_bnb_rocm uv pip install ignores pip.conf and PIP_* env vars, but python -m pip reads them. Without --isolated, users with PIP_INDEX_URL pointing to a private mirror that does not carry bitsandbytes would see the PyPI fallback fail where it previously worked under uv. --isolated restores parity with the old uv behavior. * Drop --isolated from PyPI fallback in _install_bnb_rocm --isolated suppresses PIP_INDEX_URL, PIP_EXTRA_INDEX_URL, and pip.conf. This is correct for the pre-release path (hardcoded GitHub URL, no index consulted), but breaks the PyPI fallback for users in corporate or air-gapped environments whose only route to bitsandbytes is a private mirror configured via those mechanisms. Keep --isolated on the direct-URL pre-release install; drop it from the index-dependent fallback. * Drop --isolated from pre-release pip install, fix warning wording --isolated suppresses pip.conf cert/proxy/CA settings in addition to index config. For the direct GitHub URL, index config is irrelevant but cert/proxy settings matter in corporate SSL-inspection environments. Without this fix, users with pip.conf-based CA bundles get a TLS error on the pre-release download and silently fall back to the broken PyPI version -- the exact outcome the PR is trying to prevent. Also fix the fallback warning: "unreachable" is too specific since the pre-release install can fail for reasons other than network reachability. --------- Co-authored-by: Daniel Han <danielhanchen@gmail.com>	2026-04-14 10:23:40 -07:00
pre-commit-ci[bot]	a0b9d14081	[pre-commit.ci] pre-commit autoupdate (#5004 ) updates: - [github.com/astral-sh/ruff-pre-commit: v0.15.9 → v0.15.10](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.9...v0.15.10) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-14 09:49:18 -07:00

1 2 3 4 5 ...

5075 commits